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The  Great  Equalizer? 

Consumer  Choice  Behavior  at  Internet  Shopbots 


Abstract 

Our  research  empirically  analyzes  consumer  behavior  at  Intemet  shopbots —  sites  that  allow  consumers 
to  make  "one-click"  price  comparisons  for  product  offerings  from  multiple  retailers.  By  allowing 
researchers  to  observe  exactly  what  information  the  consumer  is  shown  and  their  search  behavior  in 
response  to  this  information,  shopbot  data  has  unique  strengths  for  analyzing  consumer  behavior. 
Furthermore,  the  method  in  which  the  data  is  displayed  to  consumers  lends  itself  to  a  utility-based 
evaluation  process,  consistent  with  econometric  analysis  techniques. 

While  price  is  an  important  determinant  of  customer  choice,  we  find  that,  even  among  shopbot 
consumers,  branded  retailers  and  retailers  a  consumer  visited  previously  hold  significant  price 
advantages  in  head-to-head  price  comparisons.  Further,  customers  are  very  sensitive  to  how  the  total 
pnce  IS  allocated  among  the  item  pnce,  the  shipping  cost,  and  tax,  and  are  also  quite  sensitive  to  the 
ordinal  ranking  of  retailer  offerings  with  respect  to  price.  We  also  find  that  consumers  use  brand  as  a 
proxy  for  a  retailer's  credibility  with  regard  to  non-contractible  aspects  of  the  product  bundle  such  as 
shipping  time.  In  each  case  our  models  accurately  predict  consumer  behavior  out  of  sample,  suggesting 
that  our  analyses  effectively  capture  relevant  aspects  of  consumer  choice  processes. 


{Internet;  Choice  Models;  Brand;  Service  Quality;  Partitioned  Pricing;  Intermediaries) 
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Shopbots  are  Internet-based  services  that  provide  one-click  access  to  price  and  product  infonnation 
fiom  numerous  competing  retailers.  In  so  doing,  they  substantially  reduce  buyer  search  costs  for  product 
and  price  information.^  They  also  strip  away  many  of  the  accoutrements  of  a  retailer's  brand  name  by 
listing  only  summary  information  from  both  well-  and  lesser-known  retailers.'*  Further,  every  retailer  at  a 
shopbot  is  "one  click  away,"  reducing  switching  costs  accordingly.  In  each  instance  these  factors  should 
serve  to  increase  competition  and  reduce  retailer  margins  in  markets  served  by  shopbots  —  an  effect 
that  should  be  felt  most  strongly  for  homogeneous  physical  goods  (e.g.,  Bakos  1997). 

One  wonders,  then,  what  will  happen  to  a  retailer's  brand  equity  and  consumer  loyalty  in  the  presence 
of  shopbots.  Amazon.com  has  invested  hundreds  of  millions  of  dollars  in  developing  its  online  brand 
position.  Likewise,  brick-and-mortar  retailers  such  as  Bames  &  Noble  and  Borders  are  attempting  to 
transfer  the  value  of  their  existing  brand  names  to  online  markets. 

Our  research  addresses  these  questions  by  analyzing  consumer  behavior  through  panel  data  gathered 
from  an  Internet  shopbot.  We  use  these  data  to  study  four  major  aspects  of  Internet  shopbot  markets. 
First,  we  analyze  how  consumers  respond  to  the  presence  of  retailer  brand  names.  Second,  we  analyze 
consumer  response  to  partitioned  pricing  strategies  (separating  total  price  into  item  price,  shipping  cost, 
and  sales  tax).  Third,  we  use  Intemet  cookie  data  to  analyze  consumer  loyalty  to  retailers  they  had 
visited  previously.  Fourth,  we  use  the  responses  of  observable  groups  of  consumers  to  analyze  how 
consumers  respond  differently  to  contractible  aspects  of  the  product  bundle  versus  non-contractible 
aspects  such  as  promised  delivery  times.  In  addition,  we  analyze  the  correspondence  between  predicted 
and  actual  consumer  behavior  to  assess  the  rehabiHty  of  our  models  and  the  potential  for  retailers  to  use 
shopbot  data  to  facilitate  dynamic  or  personalized  pricing  strategies. 

We  find  that  branded  retailers  and  retailers  a  customer  had  dealt  with  previously  are  able  to  charge 
$1.13  and  more  than  their  rivals,  ceteris  paribus.  Furthermore  our  models  demonsfrate  that  consumers 
use  brand  name  as  a  signal  of  a  retailer's  reliability  in  delivering  on  promised  non-confractible  aspects  of 


To  illustrate  this,  we  had  a  group  of  students  compare  the  time  needed  to  gather  price  quotes  through  various 
means.  They  found  that  gathering  30  price  quotes  took  3  minutes  using  a  Intemet  shopbot,  30  minutes  by  visiting 
Internet  retailers  directly,  and  90  minutes  by  making  phone  calls  to  physical  stores.  In  practice,  shopbots  also 
introduce  buyers  to  numerous  retailers  who  would  otherwise  remain  unknown  to  them. 

This  characteristic  of  shopbots  was  the  subject  of  recent  litigation  between  eBay  and  BiddersEdge.com. 
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1.     Introduction 


"The  Internet  is  a  great  equalizer,  allowing  the  smallest  of  businesses  to  access  markets 
and  have  a  presence  that  allows  them  to  compete  against  the  giants  of  their  industry." 

Jim  Borland.  Knight  Ridder  (1998)' 

"The  cost  of  switching  from  Amazon  to  another  retailer  is  zero  on  the  Internet.  It's  just 
one  click  away." 

Thomas  Friedman,  New  York  Times  (1999)' 

"Shopbots  deliver  on  one  of  the  great  promises  of  electronic  commerce  and  the 
Intemet:  a  radical  reduction  in  the  cost  of  obtaining  and  distributing  information." 

Greenwald  and  Kephart  (1999) 


Two  decades  ago  information  technology  and  bar  code  scanners  radically  reduced  the  cost  of  tracking 
and  recording  consumer  purchases.  A  pioneering  paper  by  Guadagni  and  Little  (1983)  used  these  data 
to  estimate  a  multinomial  logit  model  to  analyze  attribute-based  consumer  decision  making  in  a  retail 
environment.  The  results  and  extensions  of  their  research  (e.g.,  Kamakura  and  Russell  1989;  Fader  and 
Hardie  1 996)  have  since  been  widely  applied  by  academic  researchers  and  by  industry  analysts  for 
market  forecasting,  new  product  development,  and  pricing  analysis. 

Today  continued  reductions  in  computing  cost  and  the  rise  of  commercial  uses  of  the  Intemet  augur  a 
similar  revolution  in  retailing  and  consumer  analysis.  Our  research  seeks  to  apply  multinomial  logit 
models  as  a  first  step  in  understanding  consumer  behavior  in  Intemet  markets. 

A  better  understanding  of  Intemet  markets  could  be  particularly  important  in  markets  served  by  Intemet 
shopbots.  The  Intemet  has  been  called  "The  Great  Equahzer"  because  the  technological  capabilities  of 
the  medium  reduce  buyer  search  and  switching  costs  and  eliminate  spatial  competitive  advantages  that 
retailers  would  enjoy  in  a  physical  marketplace.  Intemet  shopbots  are  emblematic  of  this  capability. 


Borland,  Jim.  1998.  "Move  Over  Megamalls,  Cyberspace  Is  the  Great  Retailing  Equalizer."  Knighi  Ridder/Tribune 
Business  News,  April  13. 
^  Friedman,  Thomas  L.  1999.  "Amazon. you"  New  York  Times,  February  26,  p.  A21. 
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Shopbots  are  Internet-based  services  that  provide  one-click  access  to  price  and  product  information 
from  numerous  competing  retailers.  In  so  doing,  they  substantially  reduce  buyer  search  costs  for  product 
and  price  information.^  They  also  strip  away  many  of  the  accoutrements  of  a  retailer's  brand  name  by 
listing  only  summary  information  from  both  well-  and  lesser-known  retailers."  Further,  every  retailer  at  a 
shopbot  is  "one  click  away,"  reducing  switching  costs  accordingly.  In  each  instance  these  factors  should 
serve  to  increase  competition  and  reduce  retailer  margins  in  markets  served  by  shopbots  —  an  effect 
that  should  be  felt  most  strongly  for  homogeneous  physical  goods  (e.g.,  Bakos  1997). 

One  wonders,  then,  what  will  happen  to  a  retailer's  brand  equity  and  consumer  loyalty  in  the  presence 
of  shopbots.  Amazon.com  has  invested  hundreds  of  millions  of  dollars  in  developing  its  online  brand 
position.  Likewise,  brick-and-mortar  retailers  such  as  Barnes  &  Noble  and  Borders  are  attempting  to 
transfer  the  value  of  their  existing  brand  names  to  online  markets. 

Our  research  addresses  these  questions  by  analyzing  consumer  behavior  through  panel  data  gathered 
from  an  Intemet  shopbot.  We  use  these  data  to  study  four  major  aspects  of  Intemet  shopbot  markets. 
First,  we  analyze  how  consumers  respond  to  the  presence  of  retailer  brand  names.  Second,  we  analyze 
consumer  response  to  partitioned  pricing  strategies  (separating  total  price  into  item  price,  shipping  cost, 
and  sales  ta.\).  Third,  we  use  Intemet  cookie  data  to  analyze  consumer  loyalty  to  retailers  they  had 
visited  previously.  Fourth,  we  use  the  responses  of  observable  groups  of  consumers  to  analyze  how 
consumers  respond  differently  to  contractible  aspects  of  the  product  bundle  versus  non-contractible 
aspects  such  as  promised  delivery  times.  In  addition,  we  analyze  the  correspondence  between  predicted 
and  actual  consumer  behavior  to  assess  the  reUability  of  our  models  and  the  potential  for  retailers  to  use 
shopbot  data  to  facilitate  dynamic  or  personalized  pricing  sfrategies. 

We  find  that  branded  retailers  and  retailers  a  customer  had  dealt  with  previously  are  able  to  charge 
SI.  13  and  more  than  their  rivals,  ceteris  paribus.  Furthermore  our  models  demonstrate  that  consumers 
use  brand  name  as  a  signal  of  a  retailer's  reliability  in  delivering  on  promised  non-contractible  aspects  of 


To  illustrate  this,  we  had  a  group  of  students  compare  the  time  needed  to  gather  price  quotes  through  various 
means.  They  found  that  gathering  30  price  quotes  took  3  minutes  using  a  Intemet  shopbot,  30  minutes  by  visiting 
Internet  retailers  directly,  and  90  minutes  by  making  phone  calls  to  physical  stores.  In  practice,  shopbots  also 
introduce  buyers  to  numerous  retailers  who  would  otherwise  remain  unknown  to  them. 

This  characteristic  of  shopbots  was  the  subject  of  recent  litigation  between  eBay  and  BiddersEdge.com. 
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the  product  bundle.  Consumer  loyalty  can  also  provide  pricing  power;  consumers  are  willing  to  pay  an 
average  of  $2.49  more  to  buy  from  a  retailer  they  have  visited  previously.  Potential  sources  for  the 
importance  of  brand  and  loyalty  include  service  quality  differentiation,  asymmetric  quality  information, 
and  cognitive  lock-in.  We  also  find  that  shopbot  consumers  are  significantly  more  sensitive  to  changes  in 
sloipping  cost  than  they  are  to  changes  in  item  price,  in  contrast  to  what  would  be  expected  from  a 
straight-forward  application  of  utility  theory  and  rational  consumer  behavior.  Lastly,  we  find  a  high 
correspondence  between  predicted  and  actual  consumer  behavior  in  our  data  suggesting  that  our 
models  capture  relevant  aspects  of  consumer  decision-making.  We  also  note  that  retailers  may  be  able 
to  use  the  predictability  of  consumer  behavior  demonstrated  in  these  models  to  facilitate  personalized 
pricing  strategies. 

Our  approach  to  analyzing  electronic  markets  differs  from  recent  empirical  studies  in  that  it  examines  the 
responses  of  actual  consumers  to  prices  set  by  retailers,  not  just  the  retailers'  pricing  behavior.  Research 
analyzing  retailer  pricing  strategies  has  been  used  to  characterize  the  relative  efficiency  of  electronic  and 
physical  markets  (Bailey  1998;  Brynjolfsson  and  Smith  2000),  retailer  differentiation  strategies  (Clay, 
Krishnan,  WolflF,  Femandes  1999),  and  price  discrimination  strategies  (demons.  Harm,  and  Hitt  1998). 
However,  retailer  pricing  strategies  provide  only  second-order  evidence  of  consumer  behavior  in 
electronic  markets. 

In  this  regard,  shopbots  provide  Intemet  researchers  with  a  unique  opportunity  to  analyze  actual 
consumer  behavior  in  Intemet  markets.  At  Intemet  shopbots,  thousands  of  consumers  a  day  search  for 
product  information  on  different  books.  Their  searches  return  comparison  tables  with  a  great  deal  of 
variation  across  retailers  in  relative  price  levels,  delivery  times,  and  product  availability.  Consumers  then 
evaluate  the  product  information  and  make  an  observable  choice  by  clicking  on  a  particular  product 
offer.  The  result  is  a  powerful  laboratory  where  Intemet  researchers  can  observe  snapshots  of  consumer 
behavior  and,  by  tracking  cookie  numbers,  consumer  behavior  over  time. 

The  data  available  at  Intemet  shopbots  have  several  natural  parallels  to  grocery  store  scanner  data. 
First,  shopbot  data  present  consumer  decisions  made  in  response  to  a  choice  between  several 
alternatives.  Second,  salient  product  attributes  are  observable  by  both  consumers  and  researchers. 
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Third,  consumer  behavior  can  be  tracked  over  time.  The  relative  strengths  and  weaknesses  of  shopbot 
data  when  compared  to  scanner  data  are  discussed  in  more  detail  below. 

The  remainder  of  this  paper  is  organized  in  four  parts.  Section  2  addresses  the  data  we  collect  how  it 
was  collected  and  its  strengths  and  limitations.  Section  3  discusses  the  empirical  models  we  use  to 
analyze  our  data.  Section  4  presents  our  results.  Section  5  concludes,  discusses  implications  of  our 
results,  and  areas  for  future  research. 

2.     Data 

2. 1.     Data  Source 

We  use  panel  data  collected  from  EvenBetter.com  to  analyze  consumer  behavior  at  kitemet  shopbots. 
We  selected  EvenBetter  for  four  reasons.  First,  EvenBetter  sells  books  —  well-defined  homogeneous 
physical  goods  in  a  relatively  mature  hitemet  market.  By  analyzing  shopping  behavior  in  markets  for 
homogeneous  goods,  we  are  able  to  control  for  systematic  differences  in  the  physical  products  through 
our  methodological  design.  Additionally,  homogeneous  physical  goods  provide  a  useful  reference  point 
for  the  importance  of  brand  and  retailer  loyalty  because  they  should  experience  strong  price  competition 
in  the  presence  of  markets  with  low  search  costs  (Bakos  1997).  Examining  relatively  mature  hitemet 
markets  ensures  a  sufficient  number  of  consumers  and  retailers  to  draw  meaningful  conclusions. 

A  second  reason  for  choosing  EvenBetter  is  that  their  service  offers  consumers  a  more  detailed  hst  of 
product  attributes  than  most  other  shopbots  for  books.  This  information  includes  separate  fields  for  the 
total  price,  item  price,  shipping  cost,  sales  tax,  delivery  time,  shipping  time,  and  shipping  service.  Third, 
EvenBetter  does  not  offer  priority  listings  to  retailers  who  pay  an  extra  fee  (as  do  some  other  shopbots; 
e.g.,  MySimon.com).  An  unbiased  listing  of  retailers  provides  a  clearer  interpretation  of  the  factors 
driving  consumers'  choices.  Fourth,  EvenBetter.com  has  a  revenue  sharing  arrangement  with  many  of  its 
retailers  allowing  us  to  compare  descriptive  statistics  for  the  relative  sales  conversion  ratios  of  the 
different  retailers. 
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A  disadvantage  of  using  data  gathered  from  Internet  shopbots  is  that  our  analysis  is  restricted  to 
consumers  who  choose  to  use  a  shopbot.  Consumers  who  choose  to  use  a  shopbot  are  likely  to  be 
systematically  different  than  consumers  who  visit  hitemet  retailers  directly.  Thus,  our  logit  model 
predictions  must  be  understood  as  being  conditioned  on  a  consumer  choosing  to  use  a  shopbot. 
Conditioning  on  prior  consumer  choice  in  this  way  does  not  bias  multinomial  logit  results  (Ben-Akiva 
and  Lerman  1985).  Furthemiore,  in  analyzing  the  effect  of  this  self-selection  bias  on  our  results,  it  seems 
reasonable  to  assume  that  shopbot  consumers  are  more  price  sensitive  than  typical  Internet  consumers 
are.  Thus,  our  estimates  of  brand  and  loyalty  effects  are  likely  to  be  lower  bounds  on  the  importance  of 
brand  and  loyalty  among  the  broader  population  of  Intemet  consumers. 

2.2.    Data  Characteristics 

EvenBetter's  shopbot  operates  similarly  to  many  other  Intemet  shopbots.  A  consumer  who  wants  to 
purchase  a  book  visits  EvenBetter  and  searches  on  the  book's  title  or  author,  ultimately  identifying  a 
unique  ISBN  as  the  basis  for  their  search.^  EvenBetter  then  queries  47  distinct  book  retailers  checking 
to  see  if  they  have  the  book  in  stock  and  their  price  and  dehvery  times.  The  prices  and  dehvery  times 
are  queried  in  real-time  and  thus  represent  the  most  up-to-date  data  from  the  retailer.  Because  the 
prices  are  gathered  directly  from  the  retailers,  they  are  the  same  prices  that  are  charged  to  consumers 
who  visit  the  retailer  site  directly.* 

Prices  are  displayed  m  offer  comparison  tables  (e.g..  Figure  1 ).  These  tables  Ust  the  total  pnce  for  the 
book  and  the  elements  of  price  (item  price,  shipping  cost,  and  apphcable  sales  taxes)  along  with  the 
retailer's  name  and  the  book's  delivery  information.  If  a  retailer  provides  multiple  shipping  options  at 
multiple  prices  (e.g.,  express,  priority,  book  rate)  the  table  lists  separate  offers  for  each  shipping 
option.^ 


'  International  Standard  Book  Numbers  (ISBNs)  uniquely  identify  the  individual  version  of  the  book  (e.g.,  binding 

type,  printing,  and  language).  Because  EvenBetter's  search  results  are  based  on  a  single  ISBN,  all  of  the  products 

returned  in  response  to  a  search  are  physically  identical. 

*"  This  fact  is  surprising  as  one  might  expect  retailers  to  use  shopbots  as  a  price  discrimination  tool  —  charging  lower 

prices  to  consumers  who  reveal  a  higher  price  sensitivity  by  virtue  of  using  a  shopbot. 

'  For  example,  in  the  offer  comparison  table  in  Figure  1,  note  that  Kingbooks.com  has  separate  listings  for  their  book 

rate,  standard,  and  2-day  shipping  services. 
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Figure  1:  Sample  Screen  from  EvenBetter.com 
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By  default,  the  table  is  sorted  by  total  price;  however,  the  consumer  can  sort  based  on  any  of  the  9 
columns  in  the  comparison  table.  After  the  consumer  has  evaluated  the  information,  they  can  click- 
through  on  a  particular  offer  and  are  taken  directly  to  the  retailer  in  question  to  finalize  their  purchase. 

We  coOect  four  categories  of  data  from  EvenBetter.com:  offer  data,  session  data,  consumer  data,  and 
choice  data  (Table  1 ).  We  define  an  offer  as  an  individual  price  quote  from  a  retailer  —  or  equivalently 
an  individual  entry  in  an  offer  comparison  table.  Our  offer  data  include  separate  variables  for  each  of  the 
nine  columns  in  the  offer  comparison  table:  total  price,  item  price,  sales  tax  (if  applicable),*  retailer 


The  tax  law  during  our  study  period  stated  that  retailers  had  to  charge  sales  tax  only  to  consumers  who  lived  m 
states  where  the  retailer  had  a  physical  location  (a.k.a.  nexus).  Furthermore,  several  companies  have  argued  that  their 
Internet  operations  are  legally  separate  from  the  physical  operations  of  the  parent  company.  Thus, 
barnesandnoble.com  must  only  charge  tax  in  New  York  (where  its  headquarters  is  located)  and  New  Jersey  (where  it 
has  a  distribution  warehouse)  even  though  its  parent  company,  Barnes  &  Noble,  has  operations  in  all  50  states. 
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name,  shipping  cost,  shipping  time,  shipping  service,  and  total  delivery  time.'^  Rank  is  the  numerical 
position  of  the  offer  in  the  table. 

Table  1:  Shopbot  Data  Collected 


Offer  Data 

Total  Price 

Total  price  for  the  offer  (item  price  plus  sales  tax  plus  shipping  cost) 

Item  Price 

The  price  for  the  item 

Shipping  Cost 

The  price  for  shipping 

State  Sales  Tax 

Sales  tax  (if  applicable) 

No  Tax 

=  1  if  there  is  no  sales  tax  on  the  offer 

Retailer 

Retailer  Name  (used  to  create  dummy  variables  for  each  retailer) 

Shipping  Time 

Time  to  ship  product  from  retailer  to  consumer  (Min,  Max,  Average) 

Acquisition  Time 

Time  for  retailer  to  acquire  product  (Min,  Max,  Average) 

Deliver>'  Time 

Shipping  time  plus  acquisition  tune  (Min,  Max,  Average) 

Shipping  Method 

Priority  ( 1-day  or  2-day),  Standard  (3-7  day).  Book  Rate  (>7  day) 

Delivery  NA 

=  1  if  retailer  can't  quote  acquisition  time  on  book 

Rank 

The  position  of  the  offer  in  the  comparison  table 

Session  Data 

Date/Time 

Date  and  time  search  occurred 

ISBN 

ISBN  number  of  book  searched  for  (used  to  calculate  book  type) 

Sort  Column 

Identifies  which  column  the  consumer  sorted  on  (default  is  total  price) 

Consumer  Data 

Cookie  Number 

Unique  identifier  for  consumers  who  leave  their  cookies  on 

Cookies  On 

=  1  if  the  consumer  has  their  cookies  on 

Country 

Which  country  the  consumer  says  they  are  from 

U.S.  State 

Which  state  the  consumer  says  they  are  from  (U.S.  consumers  only) 

Choice  Data 

Last  Click-Through 

=  1  if  the  consumer's  last  click-through  was  on  this  offer 

Click-Through 

=  1  if  the  consumer  clicked  on  this  offer 

Lovaltv  Data 

Prior  Last  Click-Through 

=  1  if  the  consumer  last  clicked  through  on  this  retailer  on  most  recent  visit 

Prior  Click-Through 

=  1  if  the  consumer  clicked  through  on  this  retailer  on  their  most  recent  visit 

We  also  track  a  variable  we  call  "dehvery  'N/A.'"  In  some  instances,  retailers  are  unable  to  determine 
how  long  it  will  take  them  to  acquire  the  book  from  their  distributor.  When  this  occurs,  EveivBetter  lists 
■"N/A"  in  the  delivery  time  field  (but  still  lists  a  numerical  value  in  the  shipping  time  field).  We  capture  this 
situation  with  a  dummy  variable  that  takes  on  the  value  of  1  whenever  "N/A"  is  listed  in  the  delivery  time 
column.  We  model  this  by  assuming  that  the  consumer  infers  the  total  delivery  time  as  the  quoted 
shipping  time  plus  an  unknown  constant  (captured  by  the  dummy  variable). 


Total  delivery  time  is  the  sum  of  shipping  time  and  acquisition  time  (the  amount  of  time  it  takes  for  the  retailer  to 
obtain  the  book  from  their  distributor). 
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From  oiir  offer  data  we  impute  two  additional  sets  of  dummy  variables  relating  to  the  type  of  shipping 
associated  with  the  offer  and  the  position  of  the  offer  in  the  comparison  table.  To  construct  dummy 
variables  associated  with  shipping  service  we  use  the  fact  that  the  shipping  services  offered  by  retailers 
generally  fall  into  three  categories:  express  shipping  (typically  a  1-2  day  shipping  time),  priority  shipping 
(3-6  day  shipping  time),  and  book  rate  (greater  than  7  day  shipping  time).  We  generate  dummy 
variables  for  each  category  of  shipping  service.  We  also  generate  dummy  variables  for  the  first  offer  in 
the  comparison  table  and  the  first  screen  of  offers  displayed  (i.e.,  the  first  10  offers)  in  the  comparison 
table. 

Our  second  type  of  data  is  session  data.  We  define  a  session  as  an  individual  search  occasion  for  a 
book,  or  equivalently  data  that  is  common  to  an  individual  offer  comparison  table.  Our  session  data 
include  the  date  and  time  the  book  search  occurred,  the  ISBN  the  consumer  searched  for,  and  whether 
the  consumer  chose  to  sort  the  offer  comparison  table  based  on  a  column  other  than  total  price  (the 
default). 

Our  consumer  data  include  fields  for  the  consumer's  unique  cookie  number,'"  whether  the  consumer 
had  fumed  their  cookies  off  (which  occurred  for  2.9%  of  the  sessions),  and  the  consumer's  state  and 
country  location.  The  state  and  country  data  are  self-reported  and  to  allow  the  shopbot  to  accurately 
calculate  local  currency,  taxes,  and  delivery  times. 

Our  choice  data  are  made  up  of  two  fields.  A  "click-through"  field  captures  whether  a  consumer 
"examines"  an  offer  from  a  particular  retailer.  Since  16%  of  the  consumers  in  our  sample  look  at 
multiple  retailers,  we  use  a  separate  field  to  record  the  last  chck-through  made  by  each  consumer  during 
a  session.  We  use  this  as  a  proxy  for  the  offer  selected  by  the  consumer.  As  noted  in  section  2.4,  the 
click-through  variable  does  not  appear  to  be  biased  with  regard  to  sales  in  a  way  that  would  affect  our 
conclusions. 


'"  The  cookie  number  is  a  unique  identifier  that  is  stored  on  the  computer's  hard  drive  by  the  retailer  or  shopbot.  The 
retailer  can  query  this  number  on  subsequent  visits  to  the  retailer's  site  and  thereby  uniquely  identify  the  consumer's 
computer. 
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Using  our  consumer  and  click-through  data  we  construct  two  additional  variables  to  help  us  control  for 
consumer  heterogeneity  (Guadagni  and  Little  1983)  and  to  track  consumer  loyalty  over  time:  Prior 
Chck,  and  Prior  Last  Click.  Prior  Click  is  a  dummy  variable  taking  on  the  value  1  for  retailers  the 
consumer  clicked  on  in  the  most  recent  visit  but  did  not  "last  click."  Similarly,  Prior  Last  Click  is  a 
dummy  variable  taking  on  the  value  1  for  retailers  the  consumer  "last  clicked"  on  in  the  most  recent  visit. 

2.3.    Data  Advantages  and  Limitations 

It  is  important  to  note  that  shopbot  data  have  unique  advantages  and  notable  limitations  when  compared 
to  grocery  store  scanner  data  (see  Table  2  for  summary).  One  advantage  of  shopbot  data  is  that  a 
higher  proportion  of  shopbot  consumers  use  identification  (cookies)  than  scanner  data  consumers 
(scanner  cards).  As  noted  above,  97.1%  of  the  Intemet  consumers  in  our  sample  left  their  cookies  on; 
whereas  typically  less  than  80%  of  grocery  store  consumers  use  scanner  cards  to  make  their  purchases. 
Likewise,  the  shopbot  does  not  need  to  establish  special  incentives  to  have  consumers  identify 
themselves.  Most  consumers  leave  their  cookies  on  out  of  ignorance,  habit,  or  convenience.  In  a 
grocery  store  setting,  consumers  must  be  given  incentives  in  the  form  of  special  discounts  or  coupons  to 
apply  for  and  use  scanner  cards." 

At  the  same  time  there  are  several  Umitations  to  the  use  of  cookies  to  identify  consumers.  Intemet 
consumers  may  have  more  than  one  computer,  and  thus  more  than  one  cookie.  Further,  some 
computers  (e.g.,  pooled  computers  at  Universities)  may  be  shared  by  more  than  one  user  (while  having 
a  single  cookie  number).'"  Consumers  may  also  periodically  destroy  their  cookies,'^  making  it  difficult  to 
track  behavior  from  cookie  to  cookie.'''  Lastly,  we  are  unable  to  observe  consumer  behavior  at  other 


"  Another  advantage  of  Internet  data  is  that  consumer  identification  can  be  transferred  between  sites.  This  is  the 
approach  used  by  firms  such  as  DoubleCHci<  and  MediaMetrix.  While  our  data  does  not  use  cross-site  identification, 
this  is  a  potentially  fruitful  application  for  analysis  (see  Johnson,  Bellman,  Lohse  2000  for  example). 
"  This  problem  is  becoming  less  of  a  concern  with  the  prevalence  of  operating  systems  with  separate  login  names  for 
individual  users  and  segmented  user  files  including  cookies  (e.g.,  Windows  NT,  Mac  OS  9,  Linux). 
'^  For  example,  by  deleting  the  file  containing  the  cookies. 

'^  Some  retailers  (e.g.,  Amazon.com)  overcome  this  limitation  by  using  consumer  login  names  to  identify  consumers. 
This  login  name  can  then  be  associated  with  multiple  cookie  numbers  intra-  or  inter-temporally.  While  our  data  does 
not  make  use  of  this  feature  to  identify  consumers,  this  technique  provides  a  potentially  useful  capability  to  increase 
the  reliability  of  Internet  cookie  data  for  future  research. 
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Internet  sites  (e.g.,  other  shopbots  or  product  retailers)  or  outside  the  sample  window. '''  However, 
these  limitations  bias  our  results  in  known  ways.  Specifically,  they  should  not  effect  our  calculations  with 
regard  to  brand  and  should  bias  our  loyalty  results  negatively  as  compared  to  a  situation  where  we 
knew  with  certainty  each  consumer's  prior  behavior. 

Table  2:  Summary  of  Data  Advantages  and  Limitations 


Characteristic 

Advantage 

Limitation 

Accuracy  of  consumer 
identification 

Higher  proportion  of 
Internet  consumer  use 
identification  (cookies). 

Cookie  data  for  consumer 
identification  less  reliable 
than  scanner  cards. 

Accuracy  of  offer  data 

Highly  reliable  knowledge 
of  competing  offers  and 
prices. 

Coupon  availability  and  use 
not  observed  directly. 

Observability  of  consumer 
behavior 

Observe  consumer  search 
behavior  (click-through 
versus  last  click-through). 

Purchases  not  observed 
directly  (only  click-through 
observed). 

Applicability  to  utility- 
based  choice  models 

Offers  are  presented  with 
individual  product 
attributes  in  sortable  table. 

Model  limited  to  factors 
driving  click-through  not 
necessarily  purchase. 

Another  advantage  of  Internet  shopping  data  as  compared  to  scanner  data  is  the  amount  and  quality  of 
data  that  can  be  collected.  With  our  data  we  know  exactly  which  offers  were  shown  to  consumers  and 
the  order  in  which  the  offers  were  displayed.  In  scanner  data  sets,  only  the  prices  of  products  that  are 
purchased  are  collected  directly.  Thus,  the  prices  and  stock  characteristics  of  competing  offers  must  be 
inferred  from  available  data  on  purchases  within  the  scanner  data  sample.  This  provides  an  imperfect 
signal  of  the  prices  and  stock  conditions  for  competing  offers  in  scanner  data  sets  (see  Erdem,  Keane, 
and  Sun  1999  for  a  discussion  of  this  problem  and  an  approach  to  address  it). 

However,  a  limitation  of  our  data  in  this  regard  is  that  we  do  not  observe  the  use  of  coupons  in  our 
Internet,  whereas  coupons  are  readily  observable  in  scanner  data  sets.  Thus,  a  consumer's  knowledge 
that  a  particular  Internet  retailer  had  a  $10  off" coupon  would  increase  the  particular  retailer's  brand 
effect  during  the  time  fiame  the  coupon  was  available.  During  our  sample  period  we  did  not  observe 
significant  use  of  coupons  (with  one  possible  exception,  noted  in  section  4).  However,  we  did  not  track 


Similar  limitations  effect  scanner  data.  Market  researchers  are  unable  to  observe  whether  a  scanner  data  consumer 
had  purchased  a  particular  brand  in  a  shopping  trip  outside  the  sample  window  or  in  a  shopping  trip  to  another 
grocery  chain. 
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couponing  systematically.  Future  research  could  track  the  availability  of  coupons  by  querying  Internet 
coupon  aggregation  site  (e.g.,  a2zdeals.com,  dealcatcher.com,  tiacat.com,  slickdeals.com). 

A  third  advantage  of  our  data  is  that  the  manner  in  wliich  offers  are  presented  is  particularly  applicable 
to  utility-based  models  of  consumer  behavior.  Shopbot  data  are  presented  in  a  comparison  matrix 
where  the  different  attributes  of  each  product  are  readily  available  and  can  be  easily  evaluated  and 
compared.  In  contrast,  the  attributes  of  products  in  a  scanner  data  context  are  more  difficult  to  compare 
directly.  Decreasing  the  effort  necessary  to  compare  the  different  attributes  of  a  bundle  should  improve 
the  accuracy  of  a  consumer's  latent  utihty  calculations  (Morwitz,  Greenleaf  and  Johnson  1998). 

A  final  advantage  of  our  data  is  that  by  comparing  the  click-through  field  to  the  last  click-through  field 
(see  Table  1 )  we  can  analyze  consumer  search  behavior:  which  retailers  do  consumers  examine  before 
they  make  their  final  selection.  In  a  grocery  store  setting,  this  would  be  equivalent  to  observing  a 
consumer  pick  up  a  particular  item,  look  at  it,  but  ultimately  put  it  down  and  choose  a  different  item  — 
data  that  could  only  be  gathered  at  a  very  high  cost  in  physical  stores. 

However,  as  above,  these  advantages  come  with  limitations.  In  our  data  set  we  only  observe  the 
consumer's  click-through  choices  —  we  do  not  observe  their  final  purchase  directly.  This  is  a  significant 
limitafion  as  compared  to  scanner  data  settings  where  purchases  are  readily  apparent.  However, 
because  of  associate  program  relationships  between  the  shopbot  and  most  of  its  retailers  we  are  able  to 
determine  whether  purchase  behavior  is  biased  in  a  way  that  would  impact  our  empirical  analysis.  We 
discuss  this  issue  in  more  detail  in  the  methodology  section. 

2.4.    Descriptive  Data 

Our  data  set  was  gathered  over  69  days  from  August  25  to  November  1,  1999.'^  To  simplify 
interpretation,  we  limit  our  analysis  to  prices  for  U.S.-based  consumers  (75.4%  of  sessions),  sessions 
that  lead  to  at  least  one  click-through  (26.3%  of  remaining  sessions)  and  sessions  that  return  more  than 
one  retailer  (99.9%  of  remaining  sessions).  The  resulting  data  set  contains  1,513,439  book  offerings 


"■  We  limited  our  sample  to  this  time  period  to  avoid  potential  bias  resulting  from  the  Christmas  season.  Nearing  the 
Christmas  holiday,  consumers  may  become  more  sensitive  to  brand  as  a  proxy  for  reliability  in  delivery  time. 
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from  39,654  searches  conducted  by  20,227  distinct  consumers.  Included  in  this  data  set  are  7,478 
repeat  visitors,  allowing  us  to  track  consumer  behavior  over  time. ' ' 

These  data  show  a  significant  dispersion  in  prices,  even  for  entirely  homogeneous  physical  goods.  The 
average  diflFerence  in  total  price  between  the  lowest  priced  ofter  and  the  tenth  lowest  priced  offer  is 
$10.77  in  our  data.  In  percentage  terms,  the  tenth  lowest  priced  offer  is  typically  32.3%  more  expensive 
than  the  lowest  priced  offer.  These  results  are  very  similar  to  Brynjolfsson  and  Smith  (2000,  p.  575) 
who  report  an  average  range  of  33%  between  the  highest  and  lowest  book  prices  obtained  from  8 
different  Intemet  retailers  in  1998-1999. 

Table  3  lists  selected  descriptive  data  statistics  for  our  data  from  the  6  most  popular  retailers  at 
EvenBetter.  Column  1  lists  estimates  of  market  share  in  the  broader  Intemet  market  and  column  2  lists 
the  share  of  last  click-throughs  for  EvenBetter's  consumers.  Comparing  these  two  columns  yields  two 
insights  into  the  Intemet  shopbot  market.  First,  shares  of  last  click-throughs  are  significantly  less 
concentrated  than  estimates  of  market  share  in  the  broader  Intemet  market  for  books.  Second,  click- 
through  shares  strongly  favor  low  priced  retailers  when  compared  to  share  estimates  in  the  broader 
Intemet  market.  For  example,  Amazon.com,  a  relatively  high  priced  retailer,  has  approximately  75%  of 
the  total  Intemet  book  market  yet  holds  only  an  8.6%  click-through  share  for  EvenBetter's  consumers. 
At  the  same  time  the  share  positions  for  three  low  priced,  and  relatively  unknown,  retailers  are 
dramatically  enhanced  at  EvenBetter.com. 

Table  3:  Comparison  of  Retailers  at  a  Shopbot 


Retailer 

Internet  Market 

Shopbot  Last  Click 

Proportion  of 

Click-Sales 

Share 

(Est.) 

Share 

Lowest  Pr 

ices 

Conversion  Ratio 

Amazon.com 

75% 

8.6% 

2.0% 

.484 

BNcom 

8% 

7.4% 

3.1% 

.461 

Borders.com 

5% 

10.9% 

9.8% 

.456 

A!  Books 

<1% 

10.0% 

12.5% 

N/A 

Kmgbooks 

<1% 

9.8% 

15.1% 

.486 

lBookstreet.com 

<1% 

5.9% 

8.3% 

.509 

*  Intemet  market  share  is  compiled  from  press  reports  and  an  analysis  of  click-through  data  from  prior  research 
(Brynjolfsson  and  Smith  2000). 


"  Limiting  our  data  in  this  way  allows  us  to  focus  our  attention  on  a  homogeneous  consumer  segment  (U.S. -based 
consumers  who  reveal  an  intention  to  purchase).  However,  future  research  could  analyze  the  differences  between 
U.S.  and  foreign  retailers,  or  the  decision  to  click-through  as  a  function  of  product  price  or  product  availability. 
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One  explanation  for  this  difference  is  that  the  lower  search  costs  offered  by  shopbots  make  it  easier  for 
consumers  to  locate  and  evaluate  unbranded  retailers  and  this  changes  their  choice  behavior  from  what 
it  would  have  been  if  no  shopbots  were  available.  To  the  extent  that  this  explanation  holds,  it  supports 
the  hypothesis  that  shopbots  are  a  "great  equalizer"  in  Intemet  markets,  putting  small  retailers  on  a  more 
equal  footing  with  their  larger  and  more  well  known  competitors.  It  is  also  possible  that  because 
EvenBetter's  consumers  are  highly  price  sensitive  they  are  more  inclined  to  shop  at  low  priced  retailers 
than  consumers  in  the  broader  market. 

However,  while  shopbot  consumers  appear  to  be  price  sensitive,  5 1  %  of  them  choose  an  offer  that  is 
not  the  lowest  price  returned  in  a  search.  Although  the  books  offered  are  completely  homogeneous, 
factors  other  than  price  influence  consumer  choice  in  this  setting.  Our  descriptive  data  suggest  that 
retailer  brand  identity  is  at  least  one  of  the  factors  influencing  consumer  behavior.  This  can  be  seen  by 
comparing  columns  2  and  3  in  Table  2.  These  columns  show  that  while  branded  retailers'^  have  the 
lowest  price  for  only  1 5%  of  the  book  searches  they  make  up  27%  of  consumer  choices.  Likewise,  the 
top  three  unbranded  retailers,  who  have  the  lowest  price  36%  of  the  time,  make  up  only  26%  of 
consumer  choices.  The  advantage  held  by  branded  retailers  can  also  be  seen  by  examining  the  offer 
price  premium,  the  difference  between  the  lowest  priced  offer  and  the  price  of  the  offer  actually 
selected.  For  branded  retailers  this  difference  averages  $3.99  while  for  unbranded  retailers  it  averages 
$2.58,  a  difference  of  $1.41. 

Our  descriptive  statistics  also  give  insight  into  consumer  purchase  behavior.  Because  our  choice  data 
only  track  click-throughs,  our  empirical  results  only  predict  factors  that  drive  traffic  to  a  site  —  not 
necessarily  factors  that  drive  sales.  However,  the  descriptive  statistics  in  column  4  of  Table  3  suggest 
that  traffic  is  a  relatively  unbiased  indicator  of  actual  sales.  These  ratios  are  constmcted  by  comparing 
the  number  of  sales  at  a  particular  retailer  during  September  and  October  1 999  to  the  number  of  last 


'*  We  refer  to  Amazon.com,  Barnesandnoble.com,  and  Borders.com  as  "branded  retailers."  Using  almost  any 
reference  point,  these  are  the  most  heavily  advertised  and  well-known  retailers  in  the  Intemet  book  market.  For 
example,  based  on  a  search  ofAltaVista.com,  these  3  retailers  make  up  97%  of  the  total  number  of  Intemet  links  to 
EvenBetter's  retailers.  Similarly,  based  on  a  search  of  Lexis-Nexis,  these  retailers  make  up  93%  of  the  references  in  the 
press  to  EvenBetter's  retailers. 
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click-throughs  recorded  for  tliat  retailer  during  the  same  time  period.'''  These  statistics  do  not  vary 
significantly  across  branded  and  unbranded  retailers  —  supporting  the  interpretation  of  our  results  with 
regard  to  the  behavior  that  influences  sales. 

Descriptive  statistics  provide  a  usefiil  first  step  in  analyzing  consumer  choice  data.  However,  definitive 
conclusions  are  only  possible  through  systematic  empirical  models  that  control  for  the  effect  of  other 
aspects  of  the  product  bundle.  In  the  next  section  we  discuss  two  systematic  empirical  models  that  can 
be  used  to  analyze  our  research  questions. 

3.     Methodology 

As  noted  above,  our  research  goal  is  to  analyze  how  consumers  respond  to  different  aspects  of  a 
product  bundle  including  brand  name,  retailer  loyalty,  partitioned  prices,  and  contractible  and  non- 
contractible  product  characteristics.  There  are  a  variety  of  choice  models  available  to  analyze  these 
questions  in  a  multidimensional  choice  setting.  We  discuss  the  two  most  prominent  models  below  —  the 
multinomial  logit  and  nested  logit  models  —  as  an  introduction  to  our  analysis.  We  also  provide  brief 
descriptions  of  multinomial  probit  as  an  alternate  empirical  model  and  Hierarchical  Bayesian  Estimation 
as  an  alternate  estimation  technique. 

As  discussed  below,  the  availability  of  a  nested  logit  model  to  control  for  concerns  about  the 
independence  of  irrelevant  altematives,  the  applicability  of  aggregate  response  in  the  shopbot  market, 
and  the  limited  availability  of  longitudinal  individual-level  choice  data  leads  us  to  conclude  that  logit- 
based  models  and  maximum  likelihood  estimation  techniques  are  the  most  appropriate  analysis 
techniques  for  our  research  questions. 


"  EvenBetter  has  associate  program  relationships  with  many  retailers  listed  at  their  service.  These  programs  provide 
EvenBetter  with  commissions  on  the  sales  driven  through  EvenBetter's  site.  As  a  reporting  function,  the  retailers 
provide  summaries  of  the  sales  that  occurred  through  EvenBetter's  service  for  a  particular  month,  allowing  us  to 
create  sales  to  click  ratios  statistics.  A 1  Books  does  not  have  an  associate  program  relationship  based  on  sales  and 
therefore  we  are  unable  to  construct  sales  to  click  ratios  for  this  retailer. 
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3. 1.    Multinomial  Logit  Model 

Given  the  parallels  between  our  data  and  scanner  data,  the  multinomial  logit  model  —  the  workhorse  of 
the  scanner  data  literature  (e.g.,  Guadagni  and  Little  1983,  Kamakura  and  Russell  1989,  Fader  and 
Hardie  1996)  —  provides  a  natural  empirical  starting  point  for  our  analysis.  We  describe  the  nature  of 
this  model  briefly  below  and  refer  the  interested  reader  to  Ben-Akiva  and  Lerman  (1985)  or  McFadden 
(1974)  for  more  detailed  treatments  of  the  model. 

In  a  choice  setting,  the  multinomial  logit  model  can  be  motivated  by  assuming  consumers  make  choices 
by  first  constructing  a  latent  index  of  utility  (U„)  for  each  offer  (?)  in  each  session  (/)  based  on  the  offer's 
characteristics  and  the  consumer's  preferences.  We  model  the  consumer's  utility  for  each  offer  as  the 
sum  of  a  systematic  component  ( V„)  and  a  stochastic  component  ( £„ ): 

fJ.,=K.+£.,  (1) 

The  stochastic  disturbance  can  be  motivated  from  a  variety  of  perspectives  (Manski  1973);  for  our 
purposes  the  two  most  natural  motivations  are  (1)  unobserved  taste  variation  across  consumers  and  (2) 
measurement  error  in  evaluating  offers. 

We  fiirther  express  {Vj,)  as  a  linear  combination  of  the  product's  attributes  ( x', )  and  the  consumer's 
preferences  for  those  attributes  ( [5  ).  Equation  ( 1 )  then  becomes 

U„=x'J  +  e„  (2) 

To  justify  this  starting  point  we  note  that,  while  modeling  consumer  choices  in  terms  of  latent  utility 
indexes  is  accepted  practice  in  the  marketing  and  economics  literature,  its  use  may  be  particularly 
applicable  in  our  setting.  By  listing  offers  in  a  comparison  matrix  with  separate  values  for  a  variety  of 
product  attributes  EvenBetter's  comparison  matrix  lends  itself  to  a  rational,  attribute-based  evaluation 
by  consumers. 

The  coefficients  in  (2)  could  be  readily  estimated  using  standard  least  squares  techniques  if  the 
researcher  could  observe  t/„  directly.  Unfortunately,  this  is  not  generally  the  case  in  practice,  histead 
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we  typically  observe  only  the  resulting  choice  in  session  i:  y^  =t .  However,  under  the  assumption  of 
utility  maximization,  we  can  infer  that  v,  =  /  if  and  only  if  t/„  =  arg  max(  t/,,  ,t/,2  ■■■U,t  )  ■  Thus,  we  can 
write  the  probabihty  that  offer  t  is  chosen  in  session  i  as: 

/^(x,„i3)  =  Pr{t/„  =  argmax(t/,„t/„,...t/„)}  (3) 

Using  (2)  this  can  be  rewritten  as: 

(4) 
Pr{e„  -e,  >  -(x„  -  x„  )'^,f„  -e,,  >  -{x„  -  x„  )'I5,...£„  -£„  >  -(x„  -  x„  YP) 

The  multinomial  logit  model  assumes  that  the  disturbance  terms  are  independent  random  variables  with  a 
type  I  extreme  value  distribution 

Pr{e^<T}  =  ^-^""  (5) 

where  )l  is  an  arbitrary  scale  parameter.  This  distribution  is  motivated,  in  part,  because  it  is  an 
approximation  to  a  normal  distribution.  However,  the  assumption  has  the  even  more  desirable  property 
that  it  dramatically  simplifies  (4)  to  the  following  form  (McFadden  1974): 

This  fonnula  has  all  the  desirable  properties  of  a  purchase  probability:  it  is  always  positive,  it  sums  to  1 
over  all  the  r,  offers  in  session  i,  and  it  is  invariant  to  scaling. 

Our  goal  is  to  determine  the  P  vector  —  the  weights  on  the  consumers'  evaluation  of  offers. 
Unfortunately,  we  estimate  ^p.  Since  |J.  is  present  in  each  of  the  P  terms  it  is  not  identifiable.  However, 
since  its  purpose  it  to  place  a  scale  on  the  utility  of  the  model,  we  can  arbitrarily  set  it  to  any  real  number 
(Ben-Akiva  and  Lerman  1985,  p.  107)  to  identify  the  P  coefficients.  While  this  is  a  benign  assumption 
in  the  multinomial  logit  model,  it  has  implications  for  our  ability  to  compare  coefficients  in  the  nested  logit 
model,  which  we  now  discuss. 
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3. 2.    Nested  L  ogit  Model 

The  parsimony  of  the  multinomial  logit  fomiula  comes  at  a  cost.  The  assumption  that  errors  are 
independent  across  offers  gives  rise  to  the  Independence  of  Irrelevant  Altematives  (DA)  characteristic  in 
the  multinomial  logit  model.  Simply  put  the  HA  problem  is  that  the  probability  ratio  of  choosing  between 
two  offers  depends  only  on  the  attributes  of  those  two  offers  and  not  on  the  attributes  of  any  other 
offers  in  the  choice  set.  Using  equation  (6)  this  can  be  expressed  as: 

This  restriction  is  violated  if  the  error  independence  assumption  does  not  hold.  The  error  independence 
assumption  might  be  violated  if  subsets  of  altematives  in  the  consumer's  choice  set  are  similar  to  one 
another.  This  problem  may  impact  our  data  if  consumers  perceive  different  branded  (or  unbranded) 
retailers  as  oflFering  similar  service  levels.  For  example,  a  consumer  who  placed  a  high  value  on  offers 
from  Amazon.com  may  also  place  a  high  value  on  offers  from  BamesandNoble.com  or  Borders.com.  In 
this  case,  the  cross-elasticity  between  offers  is  not  equal  but  rather  is  much  higher  among  branded 
retailers  than  it  is  between  branded  and  unbranded  retailers  (and  potentially  vice-versa). 

The  solution  to  this  problem  is  to  place  similar  offers  in  common  groups  —  or  nests  —  such  that  the  IIA 
assumption  is  maintained  within  nests  while  the  variance  is  allowed  to  differ  between  nests.  Thus,  the 
consumer  can  be  modeled  as  facing  an  initial  choice  S  (e.g.,  S= {branded  retailers,  unbranded 
retailers})  followed  by  a  restricted  choice  R  (e.g.,  R={{amazon,  barnesandnoble,  borders}, 
{albooks,  kingbooks,  Ibookstreet,...}).'^ 

Given  this  decision  model  we  represent  the  choice  set  for  consumer  n  as  the  Cartesian  product  of  the 
sets  5  and  R  minus  the  set  of  all  altematives  that  are  infeasible  for  individual  «,  or  C„  =  S  x  ^  -  C,* .  We 
further  define  the  marginal  brand  choice  set,  5„,  to  be  the  set  of  all  brand  options  corresponding  to  at 


^^  A  two-level  nested  model  is  chosen  here  for  expositional  simplicity  and  its  applicability  to  our  setting.  Nested 
models  containing  3  or  more  nests  are  simple  extensions  of  the  two-level  nested  logit  model  (see  Goldberg  1 995  for  an 
empirical  example  of  a  five-level  model). 


The  Great  Equalizer  20 

least  one  element  of  C„  and  the  conditional  retailer  choice  set,  /?,„,  as  the  subset  of  all  retailers  available 
to  consumer  n  conditional  on  the  consumer  making  brand  choice  s. 

We  then  model  the  utility  associated  with  a  choice  of  brand  category  and  retailer  as 

^.r-V.+Vr+Vsr+e^+e^+e^^  (8) 

where  Vs  and  F,  are  the  systematic  utilities  associated  with  the  choice  of  brand  and  retailer  respectively 
and  V,,-  is  the  systematic  utility  associated  with  the  joint  choice  of  brand  and  retailer.  The  error  terms  are 
defined  similarly  as  the  random  components  of  utility  associated  with  the  choice  of  brand,  retailer,  and 
the  joint  choice  of  brand  and  retailer. 

We  additionally  assume  that 

1 .  var(er)=0,  which  is  equivalent  to  assuming  independence  of  choice  alternatives  in  the  bottom  level 
nest  (Guadagni  and  Little  1998); 

2.  es  and  e^r  are  independent  for  brand  and  retailer  selections  in  the  consumer's  choice  set; 

3.  the  Csr  terms  are  independent  and  identically  Gumbel  distributed  with  a  scale  parameter  /J^ ,  and 

4.  the  e,  terms  are  distributed  such  that  max  U    is  Gumbel  distributed  with  a  scale  parameter  of  /J^ . 

Given  these  assumptions,  the  choice  of  retailer  conditional  on  the  choice  of  brand  at  the  lower  level  nest 
becomes 

which  is  simply  the  standard  logit  model. 
Similarly,  the  choice  of  brand  category  becomes 
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where 

F>— InY    ^  g'"'^"'  (11) 

As  in  the  multinomial  logit  model,  the  coefficients  we  estimate  are  convoluted  with  the  scale  parameter 
(|j.r).  Because  the  ^,  is  constant  within  nests,  it  is  possible  to  analyze  the  |3  parameters  within  nests. 
However,  the  scale  parameter  will  not  be  constant  across  nests  in  general,  making  it  impossible  to 
directly  compare  coefficients  across  nests  (Swait  and  Louviere  1993).  However,  it  is  possible  to 
compare  shared  coefficients  by  normalizing  to  a  common  reference  point.  We  discuss  this  in  more  detail 
in  the  analysis  section. 

5.5.    Alternate  Models  and  Estimation  Techniques 

The  multinomial  probit  model  (Hausman  and  Wise  1978)  is  the  most  recognized  altemative  to  the  logit- 
based  models  of  choice  described  above.  This  model  assumes  that  the  discrete  choice  errors  are 
normally  distributed.  The  advantage  of  this  assumption  is  two-fold.  First  it  allows  for  more  realistic 
correlation  structures  for  the  error  components,  eliminating  the  DA  problem.  Second,  and  similarly,  it 
allows  for  flexible  modeling  of  taste  variation  across  consumers  (or  other  subsets  of  choice  actors). 

However,  the  normality  assumption  comes  as  a  high  cost.  It  is  computationally  intensive  to  evaluate  the 
higher-order  multivariate  normal  integrals  used  in  the  multinomial  probit  model.  Several  advances  have 
been  made  in  the  evaluation  these  integrals.  Hausman  and  Wise  (1978)  use  a  transformation  of  variables 
to  reduce  the  dimensionality  of  the  variance-covariance  matrix  by  one.  McFadden  (1989)  employs  a 
method  of  simulated  moments  using  Monte  Carlo  simulation  to  eliminate  the  need  for  direct  estimation  of 
the  likelihood  fimction.  However,  in  spite  of  these  advances,  standard  multinomial  probit  estimation 
using  these  techniques  remains  computationaUy  infeasible  for  large  samples  or  models  with  more  than  a 
handful  of  choice  alternatives  making  it  impractical  in  our  setting.  In  its  place,  our  use  of  the  nested  logit 
model  should  control  for  IIA  concems  across  branded  and  unbranded  retailers. 


The  Great  Equalizer  22 

Hierarchical  Bayesian  Estimation  (McCulloch  and  Rossi  1994)  provides  an  individual-level  estimation 
alternative  for  both  logit-  and  probit-based  models.  Hierarchical  Bayesian  Estimation  uses  Bayesian 
techniques  to  estimate  individual-level  responses  for  each  consumer  in  a  sample  (along  with  aggregate 
level  responses).  Moreover,  the  model  makes  probit  estimation  feasible  by  using  the  Gibbs  sampler  to 
generate  an  exact  posterior  distribution  of  the  multinomial  probit  model.  This  avoids  the  computational 
problems  associated  with  estimation  of  the  multinomial  probit  likelihood  function  while  still  allowing  for  a 
correlated  error  structure. 

However,  hierarchical  Bayesian  techniques  are  typically  used  to  analyze  individual  level  consumer 
response  (e.g.,  Rossi,  McCulloch,  Allenby  1996;  Montgomery  1997).  Given  the  separation  between 
shopbots  and  retailers,  individualized  pricing  strategies  are  not  currently  used  in  shopbot  markets  making 
Hierarchical  Bayesian  techniques  less  appropriate  for  our  analysis.  Additionally,  most  of  the  customers 
in  our  data  set  make  only  a  single  purchase  or  have  relatively  short  purchase  histories,  making  individual 
level  estimation  less  reliable.  However,  with  longer  purchase  histories  Hierarchical  Bayesian  Estimation 
may  make  a  potentially  useful  area  for  future  analysis,  especially  if  shopbots  develop  individualized 
pricing  regimes  in  the  future. 

4.     Empirical  Results 

Our  analysis  addresses  four  empirical  questions:  consumer  response  to  the  presence  of  brand,  consumer 
response  to  partitioned  pricing  strategies,  consumer  loyalty  to  retailers  they  have  visited  previously,  and 
consumer  response  to  contractible  and  non-contractible  aspects  of  the  product  bundle.  We  also  use  the 
predictive  characteristics  of  our  models  to  assess  their  reliability  of  our  results  and  to  explore  the 
potential  for  retailer-based  personalized  pricing  strategies.  We  address  each  of  these  questions  in  turn 
below  using  multinomial  logit  and  nested  logit  models. 

4. 1.     Consumer  Response  to  Brand 

Retailer  brand  might  matter  to  consumers  of  homogeneous  physical  goods  if  branded  retailers  provide 
objectively  better  service  quality  or  if  consumers  are  asymmetrically  informed  regarding  individual 
retailer's  service  quality  and  are  using  brand  as  a  proxy  for  quality.  To  analyze  consumer  response  to 
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brand,  we  capture  brand  name  in  two  ways:  first  with  a  dummy  variable  that  takes  on  a  value  of  1  for 
branded  retailers,  and  second  with  separate  dummy  variables  for  each  of  these  three  retailers 
(Amazon.com,  BamesandNoble.com,  Borders.com).  Results  for  these  models  are  presented  in 
Columns  1  and  2  of  Table  4  along  with  other  variables  that  may  impact  consumer  choice:  total  price, 
average  delivery  time,  and  delivery  "N/A." "' 

As  noted  above,  the  coefficients  listed  in  Table  4  should  be  interpreted  as  preference  weights  in  a  latent 
utility  function.  Thus,  the  negative  coefficient  on  price  indicates  that  higher  prices,  ceteris  paribus,  lead 
to  lower  latent  utilities  and,  as  a  result,  to  fewer  consumer  click-throughs.  Likewise,  longer  delivery 
times  and  not  being  able  to  quote  a  specific  delivery  time  (Delivery  "N/A")  lead  to  lower  latent  utility  in 
the  consumer's  evaluation. 

Table  4:  Basic  Models  of  Brand  Choice 


/ 

•> 

3 

4 

_5 

Total  Price 

-.252  (.001) 

-.253  (.001) 

Item  Price 

-.193  (.001) 

-.194  (.001) 

-.194(001) 

Ship  Price 

-.367  (.002) 

-.368  (.002) 

-.369  (.002) 

Sales  Tax 

-.438  (.014) 

-.432  (.014) 

-.214  (.020) 

No  Sales  Tax  (0/1) 

.504  (.039) 

Average  Delivery  Time 

-.011  (.001) 

-.011  (.001) 

-.018  (.001) 

-.019  (.001) 

-.019  (.001) 

Delivery  "N/A" 

-.417  (.015) 

-.420  (.015) 

-.368  (.015) 

-.374  (.015) 

-.370  (.015) 

Branded  Retailers 

.284  (.014) 

.315  (.014) 

Amazon 

.467  (.020) 

.477  (.020) 

.463  (.020) 

BarnesandNoble 

.179  (.023) 

.177  (.023) 

.185  (.023) 

Borders 

.186  (.020) 

.266  (.020) 

.254  (.020) 

Log  Likelihood 

-100,706 

-100,630 

-98,054 

-97,990 

-97,906 

Ad|usted  U" 

.2693 

.2698 

.2885 

.2890 

.2896 

*  Standard  Errors  listed  in  parenthesis.  All  results  are  significant  at  p<.05.  Adjusted  U*  =  l-(LL(*)-# 
variables )/LL(0)  (Ben-Akiva  Lerman  1985,  p.  167).  N=39,654  sessions. 


of 


At  the  same  time,  consistent  with  the  descriptive  data  presented  in  section  2,  we  find  that  even  after 
controlling  for  price  and  delivery  time  brand  still  has  a  significant  positive  effect  on  latent  utility.  Each  of 
the  coefficients  on  brand  in  specifications  1  and  2  are  positive  and  highly  significant  suggesting  that 
consumers  are  willing  to  pay  more  for  offers  coming  fi^om  branded  retailers. 


"'  The  range  of  quoted  delivery  times  should  also  impact  consumer  choice  (e.g.,  3-7  days  versus  1-9  days).  However, 
measures  of  delivery  range  are  collinear  with  delivery  time.  Because  of  this,  we  only  analyze  average  times  in  this  and 
subsequent  results.  Using  minimum  or  maximum  delivery  times  (as  opposed  to  average  time)  does  not  substantively 
alter  our  results. 
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Following  Guadagni  and  Little  (1983),  we  can  use  the  absolute  value  of  the  ratio  of  the  coefficient  to  the 
standard  error  (the  t-statistic)  to  interpret  the  relative  importance  of  each  variable  in  the  consumer's 
evaluation  of  an  offer.  This  comparison  is  motivated  by  observing  that  larger  coefficients  indicate  factors 
that  are  more  important  in  the  consumer's  evaluation  of  the  offer  and  more  accurately  estimated 
coefficients  indicate  factors  where  there  is  a  high  degree  of  uniformity  in  response  to  the  variable.  Using 
this  comparison  we  note  that  the  total  price  variable  has  a  t-statistic  of  1 76,  which  is  nearly  10  times 
larger  than  the  next  closest  t-statistic.  This  indicates  that  an  offer's  total  price  is  by  far  the  most 
important  factor  consumer's  use  to  evaluate  offers  —  supporting  the  inference  that  consumers  are  highly 
price  sensitive  in  the  shopbot  setting. 

We  can  use  the  relative  sizes  of  the  coefficients  to  gain  an  idea  of  the  importance  of  brand  name  in  dollar 
terms.  This  comparison  exploits  the  fact  that  coefficients  in  the  multinomial  logit  are  product  attribute 
weights  in  the  consumer's  latent  utility  flmction.  Thus,  we  can  construct  counter-factual  comparisons  of 
varying  offer  characteristics  to  evaluate  the  importance  of  characteristics  in  dollar  terms.  For  example, 
we  can  ask:  Given  two  offers  that  are  exactly  the  same  with  respect  to  all  product  attributes,  if  we 
added  brand  to  one  offer,  how  much  would  we  need  to  decrease  the  price  of  the  other  offer  to  keep  the 
latent  utility  constant?  The  answer,  derived  from  equation  (2)  above  is: 

^p^zll^m.  (12) 

P  PRICE 

Using  this  equation  we  can  use  the  results  from  Table  3  column  1  to  calculate  that  offers  coming  from 
one  of  the  three  branded  retailers  have  a  $1.13  price  advantage  over  unbranded  offers.  From  column  2, 
we  ftirther  infer  that  offers  from  Amazon.com  have  a  $1.85  advantage  over  unbranded  retailers,  ceteris 
paribus,  and  offers  from  Bames  and  Noble  and  Borders  have  an  advantage  of  approximately  $0.72 
over  unbranded  retailers.  Considering  that  the  average  total  price  of  the  books  chosen  by  customers  in 
our  sample  is  $36.80,  these  figures  translate  into  3.1%  margin  advantage  for  branded  retailers  (and  a 
5.0%  margin  advantage  for  Amazon.com)  in  head-to-head  comparisons  with  unbranded  retailers. 

There  are  several  possible  explanations  for  the  price  advantage  among  branded  retailers  in  Intemet 
markets  for  homogeneous  physical  goods.  First,  branded  retailers  may  provide  objectively  better 
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service  quality  with  regard  to  product  delivery,  web  site  ease-of-use,  privacy  policies,  product  return 
policies,  or  other  service  attributes.  Retailer  differentiation  in  these  service  characteristics  is  consistent 
with  their  strategic  goal  to  mitigate  direct  price  competition  (de  Figueiredo  2000). 

Delivery  service  is  likely  to  be  one  of  the  most  important  aspects  of  a  retailer's  service  quality.  While 
our  empirical  methodology  wiU  control  for  the  quoted  delivery  time  by  each  retailer,  it  is  possible  that 
branded  retailers  are  more  reliable  in  meeting  their  quoted  delivery  times.  To  investigate  this  possibility, 
we  ordered  5  books,  using  various  shipping  services,  from  the  6  most  popular  retailers  listed  at 
EvenBetter.com  and  compared  their  actual  and  promised  delivery  times.  Our  results  are  displayed  in 
Table  5  below.  The  first  column  displays  the  number  of  books  (out  of  5)  that  were  delivered  before  the 
first  day  in  the  retailer's  quoted  delivery  range.  The  second  column  displays  the  number  of  books  that 
were  delivered  within  the  quoted  delivery  time  (out  of  5)  including  those  that  were  delivered  early.  The 
third  column  displays  the  BizRate.com  delivery  rating  (out  of  5)  for  each  retailer."'  While  each  of  the 
first  three  ratings  is  an  imperfect  measures  of  the  actual  service  quality  delivered  by  these  retailers,  they 
do  not  indicate  a  dramatic  differences  in  service  quality  between  branded  and  unbranded  retailers, 
suggesting  that  heterogeneity  in  this  aspect  of  service  quality  may  not  explain  the  majority  of  brand 
response  observed  in  our  data. 

Table  5:  Retailer  Delivery  Accuracy 


Retailer 

Early  Delivery 

On-Time  Deli 

very 

Bi 

zRate.com 

(inch 

iding  ea 

riy) 

Deh 

very  Rating 

Amazon 

3 

5 

4.5 

BarnesandNoble 

1 

5 

4 

Borders 

5 

5 

4 

A 1  Books 

5 

5 

3.5 

Kingbooks 

1 

5 

4.5 

IBookstreet 

1 

5 

4 

A  second  possible  explanation  for  the  importance  of  brand  concerns  the  information  available  to 
consumers  in  electronic  markets.  It  is  possible  that  service  quality  should  be  modeled  as  an  experience 
good  where  consumers  are  asymmetrically  informed,  ex  ante,  regarding  the  quality  they  will  receive  for 
a  particular  order. 


^^  Note  that  in  the  BizRate.eom  ratings,  A I  Books  is  rated  by  self-reported  experiences  from  Internet  shoppers 
whereas  the  ratings  for  the  other  5  retailers  are  based  on  the  experiences  ofBizRate.com  staff  members. 
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Erdem  and  Swait  (1998)  use  an  information  economics  framework  to  demonstrate  that  in  markets  witii 
asymmetric  information  about  quality,  consumers  use  brand  names  as  a  signal  of  product  quality.  These 
signals  reduce  consumers'  information  acquisition  costs,  lower  the  risk  they  must  incur  when  making 
purchases,  and  ultimately  increase  their  expected.  Brand  signals  can  be  communicated  to  consumers 
through  advertising  (Milgrom  and  Roberts  1986)  and  through  prior  personal  evaluation  (Erdem  and 
Keane  1996). 

Extending  the  information  economics  model  of  brand  value  to  the  Internet,  Erdem  at  al  (forthcoming), 
argue  that  the  Intemet  may  have  a  differential  eflFect  on  brand  value  depending  on  the  nature  of  the 
product:  "We  expect  that  for  search  goods  the  Intemet  reduces  the  importance  of  brand  in  its  role  of 
reducing  perceived  risk.  For  experience  goods. .  .we  expect  that  the  Intemet  will  not  reduce  (and  may 
well  increase)  the  importance  of  a  brand  in  its  role  of  reducing  perceived  risk"  (p.  269). 

However,  as  noted  above,  the  importance  of  service  quality  for  physical  products  ordered  over  the 
Intemet  may  cause  these  products  to  behave  more  like  experience  goods  than  search  goods.  This 
aspect  of  Intemet  markets  may  differ  conceptually  from  physical  world  markets  to  the  extent  that  the 
spatial  and  temporal  separation  between  consumers,  retailers,  and  products  in  Intemet  markets 
increases  the  importance  of  service  quality  and  reduces  consumers'  ability  to  evaluate  quality  prior  to 
making  a  purchase  (Smith,  Brynjolfsson  and  Bailey  2000).  Under  this  explanation,  retailer  branding  may 
remain  an  important  source  of  competitive  advantage  for  Intemet  retailers  —  even  in  markets  served  by 
shopbots. 

It  is  also  possible  that  our  brand  name  results  derive  from  unobserved  loyalty.  Because  we  do  not 
observe  consumer  behavior  for  visits  directly  to  the  retailer  or  for  visits  to  the  shopbot  outside  of  our 
sample  window,  consumers  have  prior  unobserved  relationships  (and  therefore  loyalty)  that 
disproportionately  resides  with  branded  retailers.  In  this  case  the  loyalty  effects  discussed  in  section  4.2 
will  also  apply  to  our  brand  coefficients. 
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4.2.     Consumer  Response  to  Partitioned  Pricing 

We  also  consider  consumer  response  to  the  elements  of  total  price:  item  price,  shipping  cost,  and  sales 
taxes.  Prices  that  are  comprised  of  a  base  cost  and  various  surcharges  are  referred  to  as  partitioned 
prices  in  the  marketing  literature.  Morwitz,  Greenleaf,  and  Johnson  (1998)  analyze  partitioned  prices  in 
environments  where  it  is  difficult  for  the  consumer  to  calculate  the  total  price  from  the  presentation  of  the 
base  price  and  surcharge."^  They  find  that  consumers  are  less  sensitive  to  the  amount  of  the  surcharge 
(and  therefore  surcharges  can  be  an  effective  pricing  strategy  for  retailers).  These  results  may  explain 
why  Intemet  retailers  commonly  use  partitioned  prices  for  their  web-site  direct  consumers.  Waiting  to 
present  the  cost  of  surcharges  such  as  shipping  cost  until  the  final  step  of  a  purchase  may  decrease  the 
Intemet  consumer's  perception  of  total  price  during  their  evaluation  of  the  product. 

However,  shopbots  present  consumers  with  a  very  different  environment  with  regard  to  partitioned 
prices.  To  analyze  consumer  response  to  partitioned  prices  in  this  setting,  columns  3  and  4  of  Table  4 
separately  model  consumer  response  to  the  elements  of  total  price:  item  price,  shipping  pnce,  and  sales 
tax.  In  contrast  to  Morwitz,  Greenleaf  and  Johnson,  these  results  suggest  that  consumers  are  nearly 
twice  as  sensitive  to  changes  in  shipping  price  than  they  are  to  changes  in  item  price.  Column  5  adds  a 
"no  tax"  dummy  variable  that  takes  on  the  value  1  when  there  are  no  tax  charges  assessed  by  the 
retailer  for  that  particular  consumer. "''  The  addition  of  this  variable  suggests  that  conditional  on  tax  being 
charged,  consumers  are  no  more  sensitive  to  changes  in  tax  than  they  are  to  changes  in  item  price. 
However,  they  respond  very  strongly  to  the  presence  of  any  tax  at  all  in  a  price  (c.f  Goolsbee  2000) 
and  they  are  still  nearly  twice  as  sensitive  to  changes  in  shipping  price  as  they  are  to  changes  in  item 
price  and  sales  tax. 

The  source  of  the  difference  between  our  results  and  those  of  Morwitz,  Greenleaf,  and  Johnson  is  likely 
due  to  the  difference  in  consumer  cognitive  processing  costs  when  associating  the  base  price  and 
surcharge  at  a  retailer's  web  site  and  at  a  shopbot.  As  noted  above,  partitioned  prices  are  typically  used 


"  I.e.,  because  they  are  computationally  difficult  to  calculate  (base  cost  plus  a  percentage)  or  involve  search  costs 
(shipping  costs  not  quoted  with  base  costs). 

■^■'  One  could  also  add  a  dummy  variable  for  0  shipping  charges.  However,  only  one  retailer  (lBookstreet.com)  offers 
free  shipping  (on  book  rate  packages),  thus  this  dummy  variable  would  be  entirely  collinear  with  the  presence  of 
IBookstreet's  brand. 
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in  a  situation  where  it  is  computationally  difficult  for  the  consumer  to  compute  the  total  price  from  the 
separate  base  price  and  surcharge  information,  hi  contrast,  at  most  shopbots  shipping  cost  and  tax  are 
included  in  the  total  price  and  identified  separately  in  the  offer  comparison  table,  making  the  effect  of 
shipping  cost  and  tax  on  the  offer  price  flilly  observable  to  the  consumer. 

Still,  finding  a  higher  sensitivity  to  shipping  costs  than  item  price  is  surprising  insofar  as  it  conflicts  wath 
the  most  straightforward  application  of  utility  theory  and  rational  consumer  behavior.  We  would  expect 
that  if  there  were  no  cost  to  calculate  the  total  price,  the  effect  of  a  $0.01  increase  in  price  would  be  the 
same  whether  it  enters  total  price  through  item  price  or  through  shipping  cost  or  sales  tax.  Apparently 
this  is  not  the  case  for  at  least  some  of  EvenBetter's  consumers.  There  are  several  possible  explanations 
for  these  findings.  First,  consumers  may  be  considering  the  fact  that  shipping  and  handling  charges  are 
non-re fiindable  in  the  event  that  they  return  the  book.  In  this  case,  the  expected  cost  of  a  book  would 
be 

E{P)  =  SHIPPING  +  (1  -  a )(ITEM  +  TAX)  (13) 

where  a  is  the  probability  of  returning  the  book.  However,  for  this  to  explain  all  of  the  observed 
difference  in  response  to  item  price  and  shipping  costs,  consumers  would  have  to  estimate  that  the 
probability  of  making  a  return  is  48%  (i.e.,  1-  P„^„/  P^hipptng)-  '^^  is  much  higher  than  the  3-5%  return 

rate  observed  in  the  monthly  sales  reports  from  EvenBetter.com's  associate  program  relationships  with 
its  retailers. 

A  second  explanation  for  the  increased  sensitivity  of  consumers  to  shipping  prices  is  that  consumers  are 
simply  opposed  to  paymg  for  costs  they  perceive  to  be  unrelated  to  the  product.  A  consumer  may 
perceive  that  a  dollar  paid  to  a  publisher  (and  eventually,  in  part  to  the  author)  is  different  than  a  dollar 
paid  to  a  store,  a  shipper,  or  to  the  government  (in  the  case  of  taxes).  Similarly,  consumers  may  object 
to  prices  they  beheve  to  be  "unfairly"  high  (Kahneman,  Knetsch,  Thaler  1986)  such  as  handling  charges 
typically  added  to  shipping  costs. 

Prospect  theory  (Kahneman  and  Tversky  1979;  Thaler  1985)  offers  third  possible  explanation. 
Consumers  may  be  using  different  reference  prices  for  shipping  costs  and  item  prices.  For  example, 
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consumers  may  be  using  a  low  (possibly  zero)  reference  price  for  shipping  charges  and  a  higher 
reference  for  item  price,  having  strong  negative  reactions  to  increases  in  price  above  their  reference 
price  for  each  price  category.  A  fourth,  and  closely  related,  possibility  is  that  consumers  evaluate 
percentage  changes  in  prices  —  responding  more  strongly  to  an  increase  in  shipping  cost  from  $3  to  $4 
than  an  increase  in  item  price  from  $30  to  $3 1 . 

A  fifth  possibility  is  that  consumers  are  planning  to  make  multiple  purchases  from  the  retailer  over 
several  shopping  visits,  and  are  taking  into  account  how  lower  shipping  costs  will  effect  their  total 
purchase  price  over  multiple  items."^ 

There  may  also  be  other  explanations  and  this  finding  deserves  more  study.  It  would  be  interesting  to 
focus  on  differences  in  consumer  response  to  partitioned  prices  between  a  typical  Intemet  retailer's  web 
site  where  base  prices  and  shipping  costs  are  presented  separately  and  a  shopbot  where  they  are 
presented  together.  Such  an  investigation  could  reveal  that  retailers  should  adopt  differential  pricing 
strategies  with  respect  to  shipping  charges  for  shopbot  consumers  and  web  site  direct  consumers. 
Similarly,  one  could  analyze  price  comparison  behavior  among  web  shoppers  from  a  prospect-theoretic 
or  cognitive  processing  context.  As  noted  above,  a  possible  explanation  for  our  results  is  that  customers 
respond  non-linearly  to  price  changes  and  have  separate  mental  accounting  functions  for  the  different 
elements  of  price.  Non-linear  response  is  also  seen  in  the  importance  of  an  offer's  position  in  the  price 
comparison  table  reflected  in  Table  8  columns  2-6  and  may  be  explained  by  prospect  theory  or  the 
cognitive  processing  costs  of  evaluation  additional  offers. 

4.3.    Retailer  Loyalty 

Our  data  can  also  be  analyzed  to  determine  the  effect  of  retailer  loyalty.  Consumers  may  be  loyal  to 
retailers  for  a  variety  of  reasons.  As  noted  above,  in  a  setting  with  asymmetric  information  regarding 
retailer  service  quality,  consumers  may  use  prior  experience  with  a  retailer  as  a  signal  of  service  quality 
in  subsequent  purchase  occasions.  Consumers  may  also  factor  in  the  cost  of  time  to  leam  how  to  use  a 


"  EvenBetter  offers  a  (separate)  service  for  consumers  making  multiple  book  purchase  at  the  same  time.  This  service 
searches  for  the  best  deal  on  the  combination  of  books,  even  suggesting  deals  that  span  two  or  more  retailers.  By  not 
including  these  consumers  in  our  analysis,  we  automatically  control  for  the  possibility  that  these  results  are  due  to 
consumers  evaluating  total  shipping  costs  on  multiple  books. 
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new  retailer  site  or  to  enter  in  the  information  necessary  to  establish  an  account  with  a  new  retailer. 
Johnson,  Bellman,  and  Lohse  (2000)  refer  to  this  effect  as  cognitive  lock-in  and  find  that  it  is  a 
significant  source  of  web  site  "stickiness." 

We  use  the  two  variables  Prior  Click  and  Prior  Last  Click  to  analyze  the  effect  of  retailer  loyalty  in  our 
setting.  To  simplify  interpretation  of  the  coefficients,  we  limit  our  analysis  to  repeat  visitors.  Our  results 
adding  these  two  variables  to  the  previous  models  are  shown  in  Columns  1  and  2  of  Table  6.  Here  we 
find  that  consumers  are  much  more  likely  to  choose  a  retailer  they  have  selected  on  a  prior  search  (Prior 
Last  Click).  In  dollar  terms,  retailers  that  a  consumer  had  selected  previously  hold  a  $2.49  advantage 
over  other  retailers.  We  also  find  that  consumers  who  had  evaluated,  but  not  selected,  a  brand  (Prior 
Click)  are  statistically  no  more  likely  to  select  that  brand  on  a  subsequent  visit.  This  suggests  that,  what 
they  learned  about  the  brand  by  visiting  the  retailer's  site  has,  if  anything,  a  negative  effect  on  subsequent 
offer  evaluations  (consistent  with  their  observed  behavior  on  the  initial  visit). 


Table  6:  Basic  Models  of  Brand  Choice  with  Loyalty  for  Repeat 

Visitors 

/ 

-> 

3 

4 

Total  Price 

-.232  (.002) 

-.233  (.002) 

Item  Price 

-.179  (.002) 

-.180  (.002) 

Shipping  Price 

-.342  (.003) 

-.343  (.003) 

Tax 

-.163  (.023) 

-.164  (.023) 

No  Tax  (0/1) 

.615  (.048) 

.603  (.048) 

Average  Delivery  Time 

-.011  (.001) 

-.010  (.001) 

-.019  (.001) 

-.018  (.001) 

Delivery  "N/A" 

-.368  (.018) 

-.373  (.018) 

-.328  (.018) 

-.332  (.018) 

Branded  Retailers 

.296  (.017) 

.314  (.017) 

Amazon 

.499  (.024) 

.482  (.024) 

BarnesandNoble 

.252  (.028) 

.254  (.027) 

Borders 

.130  (.025) 

.197  (.025) 

Prior  Last  Click 

.577  (.028) 

.579  (.028) 

.547  (.028) 

.548  (.028) 

Prior  Click 

-.UV6  (.064) 

-.082  (.064) 

-.114  (.063) 

-.105  (.063) 

Log  Likelihood 

-67,356 

-67,287 

-65,578 

65,533 

Adjusted  U' 

.2612 

.2620 

.2807 

.2812 

*  Standard  Errors  listed  in  parenthesis.  Italicized  results  are  insignificant  at  p<.05.  N=26,390 
sessions. 

These  findings  are  consistent  with  the  importance  of  cognitive  lock-in,  web  site  convenience,  and 
asymmetric  information  as  sources  of  competitive  advantage  in  electronic  markets.  They  also  help  to 
quantify  the  importance  of  first  mover  advantage  among  Internet  retailers.  Moreover,  these  results  are 
obtained  from  consumers  who  are  Ukely  to  be  among  the  least  loyal  consumers  in  Internet  markets. 
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According  to  shopbot  managers,  many  customers  use  shopbots  to  locate  retailers  they  are  happy  with 
and,  after  a  period  of  good  service,  begin  to  visit  the  retailers,  directly,  bypassing  the  shopbot  (and 
regrettably  our  data  set.  Thus,  our  loyalty  results  constitute  a  lower  bound  on  loyalty  among  typical 
Internet  customers. 

The  importance  of  loyalty  in  this  setting  also  suggests  that  shopbots  may  provide  an  effective  and  low 
cost  avenue  for  retailers  to  acquire  new  consumers  and  gain  competitive  advantage  against  their  rivals. 
This  factor  may  be  particularly  important  for  lesser-known  retailers  as  reflected  in  the  market  and  click- 
tlirough  share  statistics  presented  in  Table  2. 

4.4.     Contractible  and  Non-contractible  Product  Characteristics 

Another  aspect  of  competitive  behavior  in  Intemet  markets  pertains  to  how  consumers  respond  to 
contractible  and  non-contractible  aspects  of  the  product.  Contractible  aspects  of  the  product  bundle 
include  aspects  where  consumers  have  clear  avenues  of  recourse  if  the  retailer  does  not  deliver  what 
they  had  promised  such  as  the  characteristics  of  the  physical  product  or  the  product's  price.  Other 
aspects  of  the  product  bundle,  such  as  delivery  time,  are  non-contractible.  It  is  difficult,  if  not 
impossible,  to  force  the  retailers  to  deliver  a  product  within  the  time  frame  quoted  to  the  customer. 

In  the  presence  of  non-contractible  product  characteristics,  economic  theory  predicts  that  consumers 
will  use  a  retailer's  brand  name  as  a  proxy  for  their  credibility  in  flilfilling  their  promises  on  non- 
contractible  aspects  of  the  product  bundle  (e.g.,  Wemerfelt  1988).  Moreover,  consumers  who  are 
more  sensitive  to  non-contractible  aspects  of  the  product  bundle  should  disproportionately  use  brand  in 
their  evaluation  of  product  offers. 

To  investigate  how  customers  respond  to  non-contractible  aspects  of  the  product  bundle  we  assume 
that  consumers  who  sort  the  offer  comparison  tables  based  on  elements  of  shipping  time  (e.g.,  shipping 
service,  shipping  time,  and  total  delivery  time)  are  more  sensitive  to  accuracy  in  delivery  time  than 
consumers  who  sort  on  total  price  or  item  price.  We  then  compare  the  responses  of  these  two  sets  of 
consumers  to  selected  aspects  of  the  product  bundle  (Table  7). 
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The  selected  variables  include  the  differential  response  of  consumers  who  sort  on  shipping  columns  to 
the  product's  item  price,  shipping  price,  average  delivery  time,  and  a  dummy  variable  identifying 
whether  the  product  is  sold  by  a  branded  retailer.  These  variables  were  chosen  using  a  likelihood  ratio 
test  to  compare  the  restricted  model  (in  Table  7)  to  an  unrestricted  model  where  all  variables  are 
allowed  to  vary  between  consumers  who  sort  on  shipping  and  consumers  who  sort  on  price.  The 
likelihood  ratio  test  failed  to  reject  (p<.01)  the  null  hypothesis  that  there  is  (jointly)  no  difference  in  the 
response  of  consumers  who  sort  on  shipping  and  consumers  who  sort  on  price  to  tax,  the  no  tax  dummy 
variable,  delivery  "N/A,"  prior  last  click,  and  prior  click.^* 

Table  7:  Sorting  Based  on  Shipping  versus  Price 


Coefficients 

Item  Price 

-.194  (.001) 

Shipping  Price 

-.370  (.002) 

Tax 

-.207  (.020) 

No  Tax  (0/1) 

.524  (.039) 

Average  Delivery  Time 

-.019  (.001) 

Delivery  "N/A" 

-.369  (.015) 

Branded  Retailers 

.291  (.014) 

Prior  Last  Click 

.545  (.028) 

Prior  Click 

-.126  (.064) 

Differenlial  Coefficients  for  consumers  who  sort  on  shipping 


Sort  on  Shipping  *  Item  Price 

Sort  on  Shipping  *  Shipping  Price 

Sort  on  Shipping  *  Average  Delivery  Time 

Sort  on  Shipping  *  Branded  Retailers 


080  (.014) 
296  (.019) 
053  (.013) 
986  (.222) 


*  Standard  Errors  listed  in  parenthesis.  All  results  are  significant  at 
p<.05.  N=39,613  sessions  (39,487  sessions  sort  on  total  price  or  item 
price,  126  sessions  sort  on  shipping  time,  delivery  time,  or  shipping 
service). 

Our  results  show  that  consumers  who  care  about  accuracy  in  delivery  time  are,  not  surpnsingly,  less 
sensitive  to  item  price  and  shipping  price  and  more  sensitive  to  average  delivery  time.  However,  these 
consumers  are  also  more  than  four  times  more  sensitive  to  the  presence  of  brand  in  an  offer  than 
consumers  who  sort  in  price.  These  results  confirm  the  economic  intuition  above.  Consumers  who  care 


"*  We  note  that,  with  the  exception  of  delivery  "N/A,"  in  each  case  the  restrictions  make  intuitive  sense.  There  is  little 
reason  to  believe  that  consumers  who  sort  on  shipping  time  should  respond  any  differently  to  the  variables  relating 
to  tax  or  retailer  loyalty.  The  fact  that  there  is  no  statistical  difference  between  the  two  groups'  response  to  delivery 
"N/A"  is  more  surprising  as  we  would  expect  consumers  who  care  about  shipping  time  to  be  more  sensitive  to 
situations  where  the  retailer  is  unable  to  quote  an  acquisition  time. 
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about  non-contractible  aspects  of  the  product  bundle  appear  to  use  retailer  brand  as  a  proxy  for 
credibility. 

This  result  may  also  explain  a  comparison  of  our  results  for  frequent  versus  infrequent  visitors.  It  is 
possible  that  frequent  book  purchasers  are  more  likely  to  be  sensitive  to  quality  service  as  a  fruiction  of 
their  motivation  for  making  the  frequent  purchases.  To  analyze  this  we  classify  cookies  that  only  appear 
only  once  in  our  69-day  sample  as  infrequent  visitors  and  cookies  that  appear  multiple  times  in  our 
sample  as  frequent  visitors.  We  present  multinomial  logit  model  results  for  these  two  groups  of 
consumers  in  Table  8. 

Table  8:  Comparison  of  Frequent  and  Infrequent  Visitors 


Frequent 

Infrequent 

Visitors 

Visitors 

Item  Pnce 

-.179  (.002) 

-.228  (.003) 

Shipping  Price 

-.343  (.003) 

-.423  (.004) 

Tax 

-.422  (.017) 

-.473  (.025) 

Average  Delivery  Time 

-.018  (.001) 

-.019  (.001) 

DeUvery  "N/A" 

-.330  (.018) 

-.448  (.026) 

Branded  Retailers 

.344  (.017) 

.260  (.024) 

*  Standard  Errors  listed  in  parenthesis.  Italicized  results  are 
msignificant  at  p<.05.  N=26,390  sessions. 

As  noted  in  section  3.1,  each  model  has  a  unique  and  unidentified  scale  parameter,  which  prevents  the 
direct  comparison  of  coefficients  across  model  specifications.  However,  it  is  possible  to  compare 
coefficients  across  model  runs  after  normalizing  to  a  common  variable  within  each  specification. 
Normalizing  in  this  manner  cancels  the  scale  parameter  and  provides  a  common  basis  for  comparison. 
In  our  case,  we  normalize  each  coefficient  in  Table  8  as  follows 

/3;  =  -iiA._A,  (14) 

wherey  is  item  price  din.6.s= {frequent  visitors,  infrequent  visitors}.  Thus,  as  in  equation  12  in  section 
4.1,  we  express  each  coefficient  in  terms  of  its  dollar  value  impact  on  a  consumer's  evaluation  of  the 
product  bundle.  Our  results  from  this  normalization  are  shown  in  Table  9. 
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To  allow  for  statistical  comparison  ofoiir  nomialized  coefficients  we  use  the  fact  that  for  two  random 
variables  a  and  b,  the  variance  of  f(a,h)  is  given  by  (Bevington  1992)  as 


(15) 


For  f(a,b)  =  alb  and  using  our  unbiased  estimates  of  standard  deviation  this  simplifies  to 


^;  = 


f,  \ 


+ 


b 


r 


(16) 


The  resulting  standard  errors  {s ^  I  Jn ,  )  are  listed  in  parenthesis  in  Table  9. 


Table  9:  Comparison  of  Frequent  and  Infrequent  Visitors,  Normalized  by  Item  Price 


Frequent 

Infrequent 

Visitors 

Visitors 

Shipping  Price/Item  Price 

-1.911  (.024) 

-1.853  (.030) 

Tax/Item  Pnce 

-2.355  (.095) 

-2.073  (.111) 

Avg.  Delivery  Time/Item  Price 

-.101  (.004) 

-.083  (.005) 

Delivery  "N/A"/Item  Price 

-1.840(.101) 

-1.960(.117) 

Branded  Retailers/Item  Price 

1.916  (.097) 

1.136(.108) 

*  Standard  Errors  listed  in  parenthesis.  Italicized  results  are  insignificant 
at  p<.05.  N=26,390  for  frequent  visitors  and  13,264  for  infrequent 
visitors. 

In  each  case,  we  test  the  null  hypothesis  that  the  normalized  coefficients  are  equal  using  the  standard  t- 
test  for  lu^  =  yU^  with  a^  and  a^  unknovm  and  o^  it.  a^ 


ci-b 


(17) 


n.     n 


with  degrees  of  freedom  given  by  (Satterthwaite  1946) 
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/"    2  2  \ 

s       s. 


V  = 


^■^v 


^2V 

^h 


(18) 


-  + 


Under  this  test,  we  reject  the  null  hypothesis  for  average  delivery  time  and  the  presence  of  brand  at 
p=0.05,  finding  instead  that  frequent  visitors  are  more  sensitive  to  average  delivery  time  and  the 
presence  of  brand.  We  fail  to  reject  the  null  hypothesis  for  the  normalized  coefficients  on  shipping  price, 
tax,  and  delivery  "N/A".  "^  Consumer  response  to  these  coefficients  is  statistically  the  same  for  frequent 
and  infrequent  visitors.  One  possible  explanation  for  this  finding  is  that,  consistent  with  the  results  in 
Table  7,  frequent  purchasers  are  more  sensitive  to  elements  of  service  quality  and  this  is  reflected  in 
using  brand  as  a  proxy  for  this  non-contractible  element  of  the  product.  We  also  note  that  this  finding 
does  not  support  the  conventional  wisdom  that  regular  users  of  shopbots  will,  over  time,  rely  on  brand 
less  in  their  purchase  behavior. 

4. 5.    Model  Predictions 

An  additional  aspect  of  understanding  shopbot  markets  relates  to  how  well  the  predictions  of  our 
models  fit  actual  consumer  behavior  both  within  and  outside  the  time  sample.  Accurate  predictions  of 
consumer  behavior  both  confiim  the  vaUdity  of  our  findings  and  have  implications  for  retailers 
considering  differential  pricing  strategies  for  shopbot  markets. 

To  avoid  overfitting,  it  is  important  to  analyze  model  predictions  using  a  different  data  sample  than  the 
one  used  to  estimate  the  model.  To  account  for  this,  we  divide  our  data  into  calibration  and  holdout 
samples.  Our  calibration  sample  is  made  up  of  15,503  sessions  conducted  by  consumers  with  odd 
numbered  cookies  between  August  25,  1999  and  October  18,  1999.  We  have  two  types  of  holdout 
samples.  An  intra-temporal  holdout  sample  is  made  up  of  15,503  sessions  conducted  by  consumers 
with  even  numbered  cookies  between  August  25,  1999  and  October  18,  1999.  The  inter- temporal 


"  Applying  this  test  methodology  to  the  unrestricted  models  for  customers  who  sort  on  shipping  time  and  customers 
who  sort  on  price  yields  the  same  results  as  expressed  in  Table  7. 
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holdout  sample  is  made  up  of  8,648  sessions  conducted  during  the  last  two  weeks  of  the  data  set: 
October  19,  1 999  through  November  1,  1999. 

Table  10:  Extensive  Model  of  Consumer  Behavior 


I'iinuhles 

/ 

-) 

J 

4 

5 

6 

Price 

Total  Price 

-.062  (.002) 

-.061  (.002) 

Total  Price/Mill 

-2.254  (.059) 

Item  Price 

-.049  (.002) 

Item  Price/Min 

-.092  (.022) 

Shipping  (Fast) 

-.131  (.005) 

-.109  (.004) 

Shipping  (Priority) 

-.092  (.007) 

-.074  (.007) 

Shippuig  (Bk  Rate) 

-.046  (.005) 

-.015  (.004) 

Sales  Tax 

.007  (.026) 

-.025  (.021) 

No  Tax 

.180  (.063) 

.036(058) 

Position  in  Table 

First  Price  Listed 

2.507  (.019) 

2.256  (.022) 

2.257  (.022) 

2.054  (.024) 

2.181  (.023) 

2.390  (.022) 

In  First  10  Prices 

2.923  (.032) 

2.358  (.035) 

2.359  (.036) 

2.1 17  (.036) 

2.147  (.037) 

2.544  (.036) 

Di'liwiT  Tune 

Delivery  Avg. 

-.029  (.001) 

-.029  (.001) 

-.028  (.001) 

-.035  (.001) 

-.037  (.001) 

Delivery  "N/A" 

-.344  (.035) 

-.362  (.036) 

-.474  (.037) 

-.417  (.036) 

-.472  (.035) 

Retailer  Brand 

Amazon.com 

1.079  (.039) 

1.018  (.045) 

.988  (.045) 

.980  (.046) 

1.038  (.050) 

.895  (.048) 

BarnesandNoble 

.787  (.042) 

.591  (.049) 

.560  (.050) 

.565  (.050) 

.623  (.054) 

.477  (.052) 

Borders 

.212  (.039) 

.194  (.047) 

.166  (.047) 

.186  (.048) 

.264  (.052) 

.145  (.050) 

A 1  Books 

.126  (.039) 

.115  (.047) 

.090  (.047) 

.164  (.047) 

.217  (.051) 

-.009  (.050) 

Kingbooks 

-.491  (.039) 

-.335  (.044) 

-.354  (.044) 

-.339  (.045) 

-.360  (.048) 

-.596  (.047) 

IBookstreet 

-.143  (.046) 

-.081  (.050) 

-.117  (.050) 

-.370  (.053) 

-.147  (.059) 

-.435  (.059) 

Alphacraze 

-.036(048) 

.012  (.051) 

.018  (.051) 

.129  (.052) 

.153  (.055) 

.020  (.055) 

Alphabetstreet 

-.847  (.049) 

-1.087  (.057) 

-1.095  (.058) 

-.666  (.056) 

-.864  (.058) 

-.377  (.053) 

Shopping.com 

-.203  (.051) 

-.356  (.055) 

-.367  (.055) 

-.301  (.056) 

-.283  (.059) 

-.430  (.058) 

Fat  Brain 

-.021  (.052) 

-.261  (.061) 

-.274  (.061) 

-.296  (.062) 

-.182  (.066) 

-.287  (.064) 

CIassbook.com 

.587  (.056) 

.368  (.069) 

.344  (.070) 

.348  (.067) 

.098  (.073) 

-.234  (.069) 

Books  com 

-.739  (.056) 

-.550  (.059) 

-.548  (.059) 

-.490  (.060) 

-.576  (.061) 

-.732  (.060) 

Other  Retailers 

0 

0 

0 

0 

0 

0 

Prior  Choices 

Prior  Last  Click 

.729  (.049) 

.644  (.051) 

.723  (.049) 

.727  (.048) 

Pnor  Click 

-.112  (.113) 

-.154  (.114) 

-.I14(.113) 

-.107  (.111) 

Log  Likelihood 

-31,255 

-30,270 

-30,158 

-29,749 

-29,888 

-30,325 

Adjusted  U" 

.420 

.439 

.441 

.448 

.446 

.437 

AIC 

4.034 

3.907 

3.893 

3.840 

3.859 

3.915 

BIC 

-86,941 

-88,882 

-89,086 

-89,903 

-89,577 

-88,705 

ICOMP 

62,513 

60,558 

60,337 

59,515 

59,813 

60,681 

*  Standard  Errors  are  listed  in  parenthesis.  Italicized  results  are  insignificant  at  p<.05.  (N=15,503  sessions) 

Table  10  presents  the  results  from  applying  our  calibration  sample  to  an  extended  model  specification. 
Column  1  presents  a  minimal  model  specification  using  only  attribute  specific  dummy  variables  (Pader 
and  Hardie  1996)  to  model  different  offers  (altematives).  Our  attribute  specific  dummy  variables  include 
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the  position  of  the  offer  in  the  comparison  table  and  the  retailer  brand  name  for  all  retailers  with  greater 
than  3%  last  click-through  share  (12  retailers). 

Column  2  adds  coefficients  for  total  price,  average  delivery  time  and  delivery  "N/A".  Column  3  adds 
coefficients  for  prior  last  click  and  prior  click  behavior.  Column  4  replaces  the  coefficient  on  total  price 
with  total  price  as  a  percentage  of  the  lowest  price  available  in  the  search.  Allowing  price  to  enter  as  a 
percentage  of  the  lowest  price  in  a  search  controls  for  prospect  theoretic  effects  (Kahneman  and 
Tversky  1979)  —  in  this  case  the  possibility  that  consumers  may  respond  differently  to  a  $1  price 
increase  on  a  $5  book  than  on  a  $50  book. 

Column  5  includes  the  separate  partitioned  price  variables  and  the  "no  tax"  dummy  variable.  To  control 
for  the  possibility  that  our  shipping  price  sensitivity  results  arise  from  sensitivity  across  as  opposed  to 
within  shipping  service  types  we  include  separate  variables  for  the  shipping  price  associated  with 
express  (1-2  day),  priority  (3-6  day),  and  book  rate  (>6  day)  shipping  types.'^ 

Results  from  these  more  complete  models  are  ostensibly  the  same  as  the  results  from  the  basic  models 
in  section  3.3.1.  Consumers  respond  strongly  to  branded  retailers,  exhibit  loyalty  to  retailers  they  have 
visited  before,  respond  strongly  to  the  presence  of  sales  tax,  and  remain  more  sensitive  to  price  changes 
in  the  express  and  priority  shipping  categories  than  they  are  to  changes  in  item  price.  However, 
sensitivity  to  changes  in  book  rate  shipping  is  statistically  the  same  as  sensitivity  to  changes  in  item  price. 
This  may  support  the  inference  that  consumers  respond  negatively  to  shipping  charges  they  perceive  to 
be  above  a  retailer's  marginal  cost  since  book  rate  shipping  charges  are  typically  priced  near  cost. 

In  evaluating  the  reliability  of  these  models  we  note  the  standard  errors  are  generally  stable  across 
specifications  suggesting  that  collinearity  is  not  a  significant  problem  in  our  model  specifications.  This 
inference  is  confirmed  in  other  standard  tests  of  data  collinearity.  In  the  next  section  we  discuss  how  to 
choose  among  these  different  specifications  to  determine  the  model  that  best  combines  explanatory 
power  and  parsimony. 


^^  Our  results  including  a  single  shipping  price  variable  are  nearly  identical  to  those  reported  in  Table  4. 
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4. 5. 1.    Model  Selection  and  Model  Fit 

Table  6  presents  six  different  model  specifications  containing  different  independent  variables.  Various 
alternatives  have  been  offered  to  choose  among  model  specifications  to  best  combine  fit  and  parsimony. 
The  most  common  model  selection  criteria  fall  into  two  categories.  The  first,  Log  likelihood-based 
criteria  such  as  U'  measures  of  fit  (McFadden  1974;  Ben-Akiva  and  Lerman  1985,  p.  167)  select  the 
model  that  minimizes  the  log-likelihood  value  in  maximum  likelihood  estimation,  either  ignoring  issues  of 
parsimony  or  accounting  for  parsimony  by  subtracting  the  number  of  parameters  in  the  model.  The 
second  category,  information  theoretic  criteria,  selects  models  based  on  the  amount  of  information  in  the 
data  that  is  explained  by  the  model.  By  using  information  theory,  these  models  better  account  for  both 
the  fit  and  parsimony  of  the  different  candidate  models.  Notable  information  theoretic  measures  include 
the  Akaike  Infomiation  Cntenon  or  AIC  (Akaike  1973),  Bayesian  hiformation  Cntenon  or  BIC 
(Schwartz  1987,  Raferty  1997),  and  information  theoretic  measure  of  complexity  or  ICOMP 
(Bozdogan  1990;  Bearse,  Bozdogan,  Schlottmann  1997,  a  more  recent  test,  which  uses  the  Fisher 
information  matrix.  These  criteria  are  discussed  in  more  detail  in  Appendix  C. 

For  each  model  in  Table  10,  we  present  the  resulting  log-likelihood  values;  Ben-Akiva  and  Lerman's 
adjusted  U';  and  the  AIC,  BIC,  and  ICOMP  information  based  measures  of  model  selection.  In  spite 
of  the  very  different  nature  of  these  selection  criteria,  they  are  unanimous  in  choosing  specification  4  as 
the  "best"  specification.  These  results  are  better  than  even  tiie  results  for  the  components  of  price  in 
columns  5  and  6  suggesting  that  consumers  focus  their  comparison  on  total  price  and  that  they  are  more 
sensitive  to  percentage  changes  in  total  price  than  they  are  to  absolute  changes.  In  the  next  section  we 
use  specification  4  to  analyze  various  measures  of  the  fit  and  predictive  qualities  of  this  model. 

Once  a  model  has  been  selected  as  providing  the  best  combination  of  explanatory  power  and 
parsimony,  we  can  evaluate  how  well  the  predictions  made  by  that  model  match  observed  behavior.  To 
conduct  this  evaluation,  we  first  calculate  the  hit  rate  —  the  proportion  of  times  the  prediction  made  by 
the  model  is  the  same  as  a  choice  made  by  the  consumer  (for  the  holdout  sample)  as 

HitRate  =  ^  (19) 

N 
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where  y'  is  a  vector  which  takes  on  the  value  of  1  for  the  offer  that  has  the  single  highest  predicted 
choice  probability  in  each  session  and  0  otherwise,  and  v  is  a  vector  that  takes  on  the  value  of  1  or  0  for 
the  actual  choices  made  by  consumers. 

Using  this  definition,  we  find  a  hit  rate  of  .4873  intra-temporally  and  .4694  inter-temporally  for 
specification  4  above.  These  hit  rates  compare  very  favorably  to  hit  rates  reported  in  the  scanner  data 
literature.  While  there  is  a  slight  drop  in  the  hit  rate  for  the  inter-temporal  holdout  sample  during  the  2- 
week  period  following  out  estimation  the  hit  rate  during  this  2-week  period  is  still  quite  high. 

Furthermore,  this  drop  in  hit  rate  can  be  explained  by  analysis  of  week-by-week  predicted  and  actual 
choice  share  for  EvenBetter.com 's  consumers.  To  analyze  choice  share  in  this  way  we  use  the  holdout 
sample  to  calculate  predicted  share  for  each  brandy  in  each  week  k  as: 

^.;.=-Ia  (20) 

(Guadagni  and  Little  1983,  p.  224)  where/?,  is  the  predicted  probability  that  the  brand  is  chosen  in  each 
session  and  in  each  week  and  «k  is  the  number  of  sessions  in  each  week.  We  also  use  the  fact  that  the 
predicted  offer  selection  is  a  binomially  distributed  random  variable  to  calculate  a  standard  error  for  the 
predicted  share  as 


SEis^,)  = 


ru 


-iW2 


Xa(i-a) 


(21) 


We  then  graph  the  predicted  and  actual  choice  behavior  along  with  a  90%  confidence  interval  band 
( ±  1 .64  X  SE(s  t ) )  for  each  of  the  brands  with  more  than  3%  share.  The  graphs  are  presented  in 

Appendix  A.  The  vertical  line  in  the  graphs  between  weeks  8  and  9  represents  the  difference  between 
the  intra-  and  inter-temporal  holdout  samples. 

As  with  the  hit  rate  calculations  above,  these  graphs  show  a  strong  consistency  between  predicted  and 
actual  share  across  retailers.  Within  the  time  period  covered  by  the  cahbration  sample,  our  predicted 
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share  is  within  a  10%  error  bound  of  the  actual  share  98%  of  the  time.  During  the  subsequent  two 
weeks,  the  predicted  share  accuracy  declines  to  79%  accuracy. 

There  are  two  aspects  of  the  graphs  that  deserve  fiirther  explanation.  First,  there  is  a  strong  decline  in 
the  actual  (and  predicted)  share  ofBamesandNoble.com  during  weeks  5  and  6.  This  drop  in  share  is 
due  to  the  fact  that  EvenBetter.com  did  not  query  BamesandNoble  during  a  significant  portion  of  these 
two  weeks  because  of  concerns  about  the  accuracy  of  BamesandNoble.com's  tracking  of  sales  through 
their  site.  After  talking  with  BamesandNoble  managers,  EvenBetter  realized  that  the  discrepancy  was 
due  to  an  upgrade  at  BamesandNoble 's  site  and  that  all  the  data  had  been  recorded  correctly  and  they 
reinstated  the  retailer. 

Second,  there  is  a  dramatic  increase  in  Borders'  actual  share  during  week  10.  Further  analysis  shows 
that  on  the  last  three  days  of  the  month  of  October,  Borders'  averages  21%  of  last  click-throughs  (see 
Figure  A.  13).  During  the  first  65  days.  Borders'  share  had  averaged  10%  (with  a  daily  high  of  13%  and 
a  low  of  6%).  This  is  displayed  in  Figure  A.  13,  which  shows  the  consistency  of  Borders'  share  until  the 
end  of  the  month  and  the  return  to  a  "normal"  share  value  on  November  1,  the  last  date  in  our  data 
sample.  (Investigation  of  the  data  from  November  2  to  November  13  shows  that  Borders'  share 
remained  between  6-8%.) 

These  statistics,  combined  with  the  fact  that  there  is  no  significant  difference  in  Borders  participation  in 
sessions,  pricing  strategies,  or  shipping  policies  during  this  week,  suggests  that  the  source  of  the  share 
jump  is  possibly  a  special  temporary  promotion  on  the  part  ofBorders.com  that  we  do  not  observe  in 
our  data.  Unfortunately,  efforts  to  verify  this  have  been  unsuccessfiil.  Searches  of  press  articles  in  Lexis- 
Nexis  and  USENET  newsgroup  messages  during  this  time  period  have  not  revealed  any  mention  of  a 
special  Borders  promotion. 

However,  this  change  does  highlight  an  interesting  fact  about  this  shopbot  market.  The  increase  in 
Borders'  share  appears  to  come  at  the  expense  of  only  Amazon.com  and  BamesandNoble.com's 
shares."'^  This  suggests  that  there  is  a  high  cross-elasticity  among  the  three  branded  retailers  indicating 


^  The  drop  in  BamesandNoble. com  share  during  weeks  5  and  6  did  not  result  in  a  similar  change  in  Amazon  and 
Borders'  shares  because  in  the  Borders  case  (we  are  arguing)  that  customers  had  different  preferences  for  borders 
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that  the  UA  assumption,  mentioned  above,  may  be  too  restrictive  for  our  market  environment.  In  the 
next  section,  we  attempt  to  address  this  concem  by  modehng  the  branded  and  unbranded  retailers  in 
separate  nests  of  the  nested  logit  model. 


4. 5. 2.    Nested  L  ogit  Models 

As  noted  in  section  3,  the  nested  logit  model  offers  an  alternative  modeling  technique  to  control  for 
correlation  between  the  errors  of  different  offers.  Our  results  in  section  4.4  suggest  that  there  exist 
different  error  correlation  structures  for  branded  and  unbranded  retailer  groups.  Thus,  a  consumer  who 
places  a  high  value  of  offers  from  Amazon.com  may  also  place  a  high  value  on  offers  from 
BamesandNoble.com  and  Borders.  To  explore  this  possibility,  we  construct  a  nested  logit  model  by 
supposing  that  consumers  first  choose  whether  to  purchase  from  a  branded  or  unbranded  retailer  and 
then  choose  which  offers  to  select  from  the  subset  of  offers  in  their  choice  set  (Pigure  2). 

Figure  2:  Nested  Logit  Decision  Model 


Choice  of  Branded  or  Unbranded  Retailer 


Branded  Retailers 


Unbranded  Retailers 


Choice  of  Retailer 


offers  that  appeared  in  the  comparison  tables.  In  contrast,  during  weeks  5  and  6  the  BamesandNoble  offers  did  not 
appear  in  the  tables,  and  thus  our  estimates  of  customer  preferences  remained  accurate  for  the  remaining  choices. 
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At  the  top  level,  we  model  the  choice  between  branded  and  unbranded  retailers  as  arising  from  four 
variables.  First,  the  difference  between  the  lowest  priced  branded  offer  and  the  lowest  pnced 
unbranded  offer  when  branded  retailers  have  the  lowest  price  and  the  analogous  value  when  unbranded 
retailers  have  the  lowest  price.  Second,  whether  the  consumer  last  clicked  (or  clicked  without  last 
clicking)  on  a  branded  or  unbranded  retailer  on  their  most  recent  visit.  Third,  a  dummy  variable  for  the 
lowest  priced  category  (branded  or  unbranded).  And  fourth,  a  dummy  variable  for  branded  retailers. 
The  variables  in  the  bottom  level  nests  are  the  same  as  those  in  column  4  of  table  8,  except  that  we  add 
a  dummy  variable  for  the  offer  with  the  best  price  in  each  nest  ("Best  Price  In  Nest"). 

We  estimate  our  nested  logit  model  sequentially  as  described  in  Ben-Akiva  and  Lerman  (1985,  pp. 
297-298)  and  Guadagni  and  Little  ( 1 998).  Sequential  estimation  produces  consistent  but  asymptotically 
inefficient  estimates,  causing  the  standard  errors  to  be  too  small  (Amemiya  1 978).  However,  it  has  been 
shown  that  in  many  applications  the  resulting  standard  errors  are  not  significantly  different  from  those 
resulting  from  Full-friformation  Maximum  Likelihood  estimation  (Bucklin  and  Gupta  1992,  p.  205). 
Given  the  strong  significance  of  nearly  all  our  coefficient  estimates  it  is  highly  unlikely  that  Full 
Information  Maximum  Likelihood  estimation  would  change  our  results. 

Table  11:  Nested  Logit  Model:  Top  Nests 


Variable 

Coefficient 

Price  Difference  if  Brand  Lowest  Price 
Price  Difference  if  Unbranded  Lowest  Price 

.033  (.009) 
.060  (.004) 

Prior  Last  Click  Brand 
Prior  Click  Brand 

.358  (.056) 
-.323  (.097) 

Lowest  Priced  Category 

1.012  (.037) 

Branded  Retailer 
Unbranded  Retailer 

.358  (.056) 
0 

*  Standard  Errors  are  listed  in  parenthesis.  Italicized  results  are 
insignificant  atp<0.10.  n=39,654  sessions 

Our  results  using  the  nested  logit  model  are  presented  in  Tables  1 1  and  1 2  for  the  top  and  bottom  level 
nests  respectively.  These  results  are  consistent  with  the  results  presented  above  for  the  multinomial  logit 
model:  consumers  are  very  sensitive  to  price  (as  evidenced  by  the  coefficients  on  "lowest  priced 


The  Great  Equalizer  43 

category,"  price,  and  position  in  table),  but  still  respond  strongly  to  the  presence  of  brand  and  retailer 
loyalty.^" 

Table  12:  Nested  Logit  Model:  Bottom  Nests 


Branded 

Unbranded 

Retailers 

Retailers 

Price 

Total  Price/Min  Total  Price 

-5.735  (.246) 

-1.841  (.066) 

Position  ill  Table 

First  Price  Listed 

1.013  (.080) 

1.296  (.096) 

In  First  lOPnces 

1.054  (.076) 

2.366  (.049) 

Best  Price  In  Nest 

.634  (.068) 

.897  (.095) 

Deliven,'  Time 

Delivery  Average. 

-.024  (.003) 

-.028  (.002) 

Delivery  "N/A" 

-.576  (.121) 

-.534  (.043) 

Retailer  Brand 

Amazon.com 

1.267  (.067) 

BarnesandNoble 

.753  (.069) 

Borders 

0 

AlBooks 

.130  (.054) 

Kingbooks 

-.381  (.050) 

IBookstreet 

-.420  (.059) 

AlphaCraze 

.173  (.056) 

AlphabetStreet 

-.645  (.060) 

Shopping.com 

-.341  (.062) 

Fat  Brain 

-.293  (.067) 

Classbook.com 

.267  (.075) 

Books.com 

-.548  (.064) 

Other  Retailers 

0 

Prior  Choices 

Prior  Last  Click 

.338  (.119) 

.712  (.070) 

Pnor  Click 

-.1^9  (.257) 

-.424  (.160) 

*  Standard  Errors  are  listed  in  parenthesis.  Italicized  results  are 
insignificant  at  p<.05.  (Branded  Retailer  n=4,023,  Unbranded  Retailers 
n=  11,480) 

In  addition  the  fit  and  predictive  power  of  these  models  are  quite  good.  Our  hit  rates  for  the  nested  logit 
results  are  slightly  higher  intra-temporally  (.4880)  and  significantly  higher  inter-temporally  (.4855)  than 
those  for  the  multinomial  logit  models  reported  above.  The  increase  in  inter-temporal  hit  rate  reflects  the 
fact  that  placing  the  branded  retailers  in  a  separate  nest  improves  the  predictions  for  branded  retailers 
during  week  10  when  Borders'  share  increases.  The  model  still  does  not  predict  the  increase  in  Borders 
share.  However,  because  the  nested  logit  models  elasticity  within  nests,  the  actual  shares  for  Amazon 


^^  Because  the  specifications  in  Table  12  control  for  different  retailers  (by  construction)  it  is  infeasible  to  use  the  same 
techniques  presented  in  section  4.4  to  compare  coefficients  between  nests. 
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and  BamesandNoble  fall  within  a  10%  error  bound  of  the  predicted  shares  during  week  10.  Predicted 
and  actual  share  for  branded  retailers  under  the  nested  logit  model  are  shown  in  Appendix  B.  Because 
the  share  predictions  for  the  unbranded  retailers  are  similar  to  those  shown  in  Appendix  A,  we  suppress 
the  graphs  for  these  retailers.  The  similarity  in  the  multinomial  and  nested  logit  results  with  regard  to 
coefficients  and  predictions  also  provides  confirmation  that  the  KA  problem  does  not  significantly  impact 
our  previous  results. 

One  implication  of  the  quahty  of  our  inter-  and  intra-temporal  share  predictions  is  that  retailers  may  be 
able  to  use  information  gathered  from  Intemet  shopbots  to  create  personalized  prices  for  shopbot 
consumers.  Shopbots  could  arrange  to  pass  information  regarding  the  consumer's  prior  search  behavior 
and  product  characteristics  for  competing  offers  to  retailers,  allowing  them  to  calculate  a  personahzed 
price  for  this  consumer  to  maximize  their  profits. 

Using  this  information,  the  retailers  could  use  the  multinomial  logit  equation  (equation  6)  to  calculate  the 
probability  that  their  offer  would  be  chosen  as  a  ftinction  of  their  price  ( P' ),  their  product 
characteristics  {(p),  the  prices  and  product  characteristics  of  competing  offers  (0_| ,  P,*  ),  and  the 
consumer's  characteristics  ( 6 ): 

P{P\0,P:„(P_„e)  (22) 

With  this  knowledge,  the  retailer  could  then  choose  a  price  to  maximize  their  profit  for  this  transaction: 

max[(/'*-c)P(P',0,/i;,0.,,0)]  (23) 

p 

With  an  estimate  of  the  annual  frequency  of  the  consumer's  visits  to  the  shopbot  ( F{d) )  and  the 
marginal  loyalty  advantage  from  being  chosen  on  this  purchase  (A{6)  ),  and  a  discount  rate  for  future 
revenue  (/'),  the  retailer  could  instead  maximize  the  net  present  value  of  being  chosen  in  the  current 
transaction: 


max 

p' 


{p'-c)P{p\(t),p:,,(p_„d)+f^—^p{p',(p,p:„(i>.„d)F{d)A{d) 

,=1  (i  +  0 


(24) 
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hi  implementing  a  personalized  pricing  system  involving  one  or  multiple  retailers,  the  shopbot  would 
have  to  be  mindllil  of  the  overhead  in  processing  time  such  a  system  would  impose  on  their  ability  to 
return  prices  to  their  consumers  and  the  privacy  concerns  of  their  consumers.  Still,  employing  such  a 
system  would  allow  shopbots  to  build  lock-in  among  their  consumers  and  leverage  their  most  important 
source  of  competitive  advantage  —  knowledge  of  consumer  behavior. 

5.     Conclusions 

As  Internet  shopbot  technologies  mature,  consumer  behavior  at  shopbots  will  become  an  increasingly 
ijnportant  topic  for  consumers,  retailers,  financial  markets,  and  academic  researchers. 

With  regard  to  consumer  behavior,  our  findings  demonstrate  that,  while  shopbots  substantially  weaken 
the  market  positions  of  branded  retailers,  brand  name  and  retailer  loyalty  still  strongly  influence 
consumer  behavior  at  Intemet  shopbots.  These  factors  give  retailers  a  3.1%  and  6.8%  margin 
advantage  respectively  over  their  competitors  in  this  setting.  Our  findings  also  suggest  that  consumers 
use  brand  name  as  a  signal  of  reliability  in  service  quality  for  non-contractible  aspects  of  the  product 
bundle.  These  results  may  derive  from  service  quality  differentiation,  asymmetric  market  information 
regarding  quality,  or  cognitive  lock-in  among  consumers. 

With  regard  to  retailers,  our  results  suggest  several  differential-pricing  strategies  for  shopbot  markets. 
First,  it  is  likely  that  a  consumer's  willingness  to  take  the  extra  time  to  use  a  shopbot  is  a  credible  signal 
of  price  sensitivity.  Thus,  retailers  may  use  this  information  as  part  of  a  price  discrimination  strategy  — 
charging  lower  prices  to  shopbot  consumers  than  consumers  who  visit  their  web  site  directly.  Second, 
our  findings  suggest  that  partitioned  pricing  strategies  that  increase  demand  among  web  site  direct 
consumers  may  decrease  demand  among  shopbot  consumers.  Because  of  this,  retailers  should  adopt 
different  pricing  strategies  for  shipping  cost  for  shopbot  consumers  than  they  would  for  web  site  direct 
consumers.  Lastly,  the  reliability  of  our  models  when  compared  to  actual  consumer  behavior  suggests 
that  retailers  may  be  able  to  use  shopbot  data  to  provide  personalized  prices  to  consumers. 

For  financial  markets,  our  findings  may  help  to  focus  the  debate  on  the  size  and  sustainability  of  market 
valuations  for  Intemet  retailers.  Using  Amazon.com  as  an  example,  our  shopbot  data  indicate  that  the 
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retailer  mamtains  a  5.0%  margin  advantage  over  unbranded  retailers  and  a  6.8%  margin  advantage 
among  repeat  visitors.  Both  of  these  statistics  are  likely  to  represent  lower  bounds  on  the  actual  margin 
advantages  among  their  entire  consumer  base.  A  margin  advantage  of  this  magnitude,  if  sustainable  and 
applicable  across  their  entire  product  line,  implies  a  very  large  capital  value.^'  The  relevant  questions 
then  become  whether  companies  such  as  Amazon.com  can  sustain  current  positions  of  competitive 
advantage,  how  much  it  will  cost  to  sustain  these  positions,  and  whether  they  can  transfer  competitive 
advantage  in  one  product  category  to  other  product  categories  to  expand  their  revenue  base. 

Finally,  for  academic  researchers,  our  results  demonstrate  the  feasibility  of  using  Intemet  shopping  data 
to  better  understand  consumer  behavior  in  electronic  markets.  Future  research  in  tliis  regard  may  be 
able  to  extend  these  results  to  better  understand  how  web-site  direct  and  shopbot  consumers  respond 
to  partitioned  prices,  to  evaluate  the  cognitive  processing  costs  of  shopbot  consumers,  and  to 
empirically  analyze  the  application  of  personalized  pricing  strategies  to  shopbot  consumers.  Moreover, 
our  results  suggest  that  the  quantity  and  quality  of  data  available  in  Litemet  markets  may  introduce  a 
revolution  the  analysis  of  consumer  behavior  rivaling  that  of  the  scanner  data  revolution  in  the  1980s. 


^'  For  example,  Amazon.com  reports  that  76%  of  their  consumers  are  repeat  visitors,  giving  them  an  average  margm 
advantage  of  10.2%  on  their  customer  base  after  combining  our  brand  and  loyalty  results.  Zack's  Investment 
Research  predicts  that  Amazon.com  will  grow  by  an  average  of  57.9%  over  the  next  5  years.  Amazon.com  reports  net 
revenue  of  S574  million  for  first  quarter  2000  across  all  product  categories.  Assuming  that  Zack's  growth  projections 
hold,  that  growth  stops  after  5  years,  and  assuming  a  5%  interest  rate,  the  net  present  value  of  Amazon. corn's  10.2% 
margin  advantage  is  over  $40  billion. 
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Appendix  A:  Week-by-Week  Predicted  to  Actual  Choice  Share,  Multinomial  Logit  Model 


Figure  A.l:  Amazon.com 
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Figure  A.2:  BarnesandNoble.com 
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Figure  A.3:  Borders.com 


16.0% 


0.0% 


Figure  A.4:  AlBooks.com 
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Figure  A.5:  Kingbooks.com 
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Figure  A.6:  lBookstreet.com 
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Figure  A.7:  AlphaCraze.com 
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Figure  A.8:  Alphabetstreet.com 
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Figure  A.9:  Shopping.com 
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Figure  A.  10:  FatBrain.com 
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Figure  A.ll:  Classbook.com 
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Figure  A.12:  BooliS.com 
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Figure  A.13:  Borders  Last  Click-Through  Share  —  10/19/99  -  1 1/1/99 
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Appendix  B:    Week-by-Week  Predicted  to  Actual  Choice  Share,  Branded  Retailers,  Nested 
Logit  Model 


Figure  B.l:  Amazon.com 
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Figure  B.2:  BarnesandNoble.com 
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Figure  B.3:  Borders.com 
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Appendix  C:    Model  Selection  Criteria 

This  appendix  presents  several  of  the  most  common  model  selection  criteria  applied  to  multinomial  logit 
models.  As  noted  above,  these  criteria  fall  into  two  general  categories:  log  likelihood-based  measures 
and  inibrmation  theoretic  measures.  Significant  criteria  from  each  category  are  presented  in  tum  below. 

The  most  common  model  selection  criterion  is  the  likelihood  ratio  test.  Likelihood  ratio  tests  can  be 
used  to  evaluate  multiple  restrictions  on  a  model  (e.g.,  Guadagni  and  Little  1983).  Likelihood  ratio  tests 
in  this  setting  are  based  on  the  observation  that  2(log(  L(d ^ ))  - log(  L(dg  ))~  X'  with  degrees  of 
freedom  equal  to  the  number  of  restrictions  between  model  A  and  B. 

Applied  to  our  model,  likelihood  ratio  tests  reject  at  any  reasonable  confidence  level  the  restrictions  on 
specification  1  above  with  respect  to  all  other  specifications  and  on  specification  2  with  respect  to 
specification  3.  However,  these  tests  are  only  applicable  where  one  model  can  be  expressed  as  a 
restricted  subset  of  the  second  model.  Therefore  we  cannot  use  likelihood  ratio  tests  to  compare 
specification  3  to  specification  4,  for  example. 

Another  technique  to  choose  among  multinomial  logit  model  specifications  is  to  use  a  measure  of  fit 
analogous  to  R^  in  multivariate  linear  regressions.  McFadden  (1974)  proposes  to  measure  this  value  as 

u--->J^^^i£l  (c.i, 

iogi(e°) 

where  L(9 ' )  is  the  likelihood  associated  with  the  specification  in  question  and  L{d " )  is  the  hkelihood 
of  the  null  model  (the  constrained  model  excluding  all  regressors). 

Ben-Akiva  and  Lerman  (1985,  p.  167)  note  that  this  measure  will  always  (weakly)  increase  when  new 
variables  are  added  to  the  model  whether  or  not  these  variables  contribute  usefially  to  explaining  the 
data.  Therefore,  this  measure  does  not  adequately  account  for  desired  parsimony  in  the  selected 
specification.  For  this  reason,  the  Ben-Akiva  and  Lerman  adjust  McFadden's  U'  measure  to  penalize 
the  addition  of  variables 
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-.^^logLC^V^  (C.2) 

where  k  is  the  number  of  independent  variables  in  the  model.  Using  either  measure,  the  best  model  is  the 
one  with  the  largest  U',  corresponding  to  the  model  that  explains  the  most  variation  in  the  data.  Further, 
unlike  the  likelihood  ratio  presented  above,  these  tests  can  be  used  to  compare  models  that  cannot  be 
expressed  as  restricted  subsets  of  each  other. 

A  variety  of  model  selection  measures  have  been  proposed  based  on  concepts  of  information  theory. 
The  most  well  known  of  these  measures,  the  Akaike  Information  Criterion  or  AIC  (Akaike  1973)  is 
specified  as 

^,C  =  zl}SiMl±ll  (C.3, 

N 

where  P  is  the  number  of  parameters  in  the  model  (the  number  of  independent  variables  plus  the  slope 
coefficient)  and  A'^  is  the  number  of  observations.  Intuitively,  for  models  with  better  fit,  L(d)  should 
increase  and  -  2  log  L(0)  should  decrease.  The  2P  term  will  decrease  with  more  parsimonious  models. 
Thus,  the  "best"  model  minimizes  the  AIC  criterion. 

The  Bayesian  hiformation  Criterion  or  BIC  (Raferty  1986,  Schwartz  1987)  provides  a  similar  measure, 
based  on  Bayesian  statistical  theory.  In  a  Bayesian  setting,  we  compare  two  models  based  on  the  ratio 
of  their  posterior  probabilities.  If  Model  2  is  preferred  over  Model  1  this  odds  ratio  will  be  greater  than 
1.  The  posterior  odds  ratio  of  Model  2  to  Model  1  can  be  written  as 

P(A/_,  I  Data)  _  ¥(Data  \  M, )    P(A/-,) 
P(M,  \Data)  ~  P{Data\M^)    P(M,) 

where  the  first  factor  on  the  right  hand  side  of  the  equation  is  called  the  Bayes  factor  for  Model  2 
against  Model  1  and  the  second  factor  is  the  ratio  of  the  prior  probability  for  Model  2  against  Model  1. 
In  the  general  case  where  there  is  no  prior  probability  for  choosing  Model  2  against  Model  1 ,  this  ratio 
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will  be  1  and  the  posterior  odds  ratio  will  be  equal  to  the  Bayes  factor.  Unfortunately,  calculating  the 
Bayes  factor  is  computationally  prohibitive. 

However,  the  Bayesian  Information  Critenon  (BIC)  presents  a  usefiil,  and  easily  calculated, 
approximation  to  the  Bayes  Factor.  BIC  is  defined  as 

BIC  =-2]n  1(0)- (N-k)]nN  (C.5) 

where  6  and  A^  are  defined  as  above,  and  A'  is  the  number  of  regressors.  Relating  this  to  the  Bayes 
factor,  it  can  be  shown  (Raftery  1995)  that 


21n 


^P(Data\M,)^ 


^  BIC, -BIC,.  (C.6) 


P{Data\M,) 

Thus,  as  with  the  AIC  measure  above,  the  best  model  is  the  model  that  minimizes  BIC. 

The  information  theoretic  measure  of  complexity  or  ICOMP  (Bozdogan  1990;  Bearse,  Bozdogan, 
Schlottmann  1997)  provides  an  alternate  model  selection  criteria.  ICOMP  uses  the  Fisher  information 
matrix  to  measure  (penalize)  complexity  in  the  model.  The  measure  is  defined  as 


/COMP  =  -21nZ.(0)-ytln(?r(/"'(0))/A-)-ln/"'(0) 


(C.7) 


where  /  '  (0 )  is  the  inverse  Fisher  information  matrix.  The  advantage  of  ICOMP  is  that,  instead  of 
viewing  complexity  as  arising  from  the  number  of  parameters  (e.g.,  U ' ,  AIC,  BIC),  it  evaluates  model 
complexity  from  the  correlation  structure  of  the  parameter  estimates  (through  the  inverse  Fisher 
information  matrix). 
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